Feasibility for Self-Hosting Scientific Datasets?

I’m interested in using IPFS as a means of distributing and archiving scientific datasets. Towards this end, I’ve been running a project for the past ~3 years (starting in late 2020) where I’ve been running a Rasberry Pi 4 that hosts a ~30GB image dataset. I want to share my experience as a real-world use case for IPFS and document what has been easy / hard. My hope is that this use-case can be useful to identify and motivate improvements for kubo, or perhaps help me improve my setup.

Use Case

I’m working on a project called shitspotter where I’m training a neural network to detect dog poop in images. This requires having access to a labeled image dataset which I have been building. Details about the project are here: GitHub - Erotemic/shitspotter: An open source algorithm and dataset for finding poop in pictures. A work in progress.

In terms of interacting with IPFS, what I do is I have a root folder that contains an assets directory. Every month I make a new folder corresponding to that date, and copy all of the new images I’ve taken into it. I then run

ipfs add --pin -r <root> --progress --cid-version=1 --raw-leaves=false

on the root path, which crawls the directory, identifies content that have already been uploaded, and then tracks the new content. This gives me a new CID for the updated dataset, and this is what I publish on the github readme. I’ve found this to be nice, because old folders keep their old CID, so as if people pin the dataset at any point, they help host at least some of the content, even as new data is added. It is important to note that I originally uploaded the dataset without --cid-version=1, and that is why I’m setting --raw-leaves=false so anyone who pinned the data when I first released it is still helping to host that data as I continue to add new data.

What’s been easy

Working with IPFS in a LAN has been great. I pin the data on my main machine (which is not exposed to the WAN), and then I run a pin command on my rasberry pi (which is connected to the WAN), and that very quickly transfers all the data to the public-facing IPFS server.

My employer also lets me use their IPFS server to re-pin the data. This tends to be a bit slower than working on the LAN, but its’ reasonable, and it guarentees me that there are at least 2 nodes pinning the entire dataset.

At one point I was using web3.storage to have a 3rd host of the data, but they not longer offer a free-tier, so I stopped doing that. Still the “–service” option is very nice.

What’s been hard

While I’ve been able to access the data very quickly, that hasn’t been the case for other people. I recently got an email from a person interested in using the data, and they attempted to run:

ipfs ls bafybeie275n5f4f64vodekmodnktbnigsvbxktffvy2xxkcfsqxlie4hrm

The top-level content is a few folders and several smaller ~100KB files:

bafybeifqbkqxif73ewelbnr4cqfhpljl2yz2rfksb2y7dvyhokiddsd5qy -       _cache/
bafybeic5a4kjrb37tdmc6pzlpcxe2x6hc4kggemnqm2mcdu4tmrzvir6vm -       analysis/
bafybeicirkqvz6pedd3mvpokyo7cwy2x3isxxnjiplzdgsi22qxs2wv6ie -       assets/
bafybeiesjhwbueg7nuyy4ga2dfxo7bjvkk3hhiesukqygogcmbfqgqg2ee 3119681 data.kwcoco.json
bafybeig63rot73r22hwnzw2dofqzvmz5ubwdnb5dn5xr5ylmwv4uh2ca3y 79      train.kwcoco.zip
bafybeifagh5dvowtjlepejnppbjcyfxipt55c36s4i6s7ljbcqkw5euuqm 66229   train_imgs278_27bcbd3e.kwcoco.zip
bafybeifcch6zn4y6a73ougeltsfetlhmzwjxkuhh2h2zk4iqkuo4ihswre 84417   train_imgs346_11d67089.kwcoco.zip
bafybeicwt6iy5crccdpybk3io7teb2innbmt5u3rqqtkkg6347bdc3ck2y 84528   train_imgs346_3e3fc072.kwcoco.zip
bafybeibvesr2rxfjaj6y6mfwyslq5wfifyp6rfrerybtplljwseccf5p5u 91111   train_imgs386_4653bb8f.kwcoco.zip
bafybeifaidhinenbykuei6ptxeedn4b3ypxwp57wyjb6cv4iu5ssdiujeu 105749  train_imgs454_ff3d0b9d.kwcoco.zip
bafybeibubxgytkp67yo5jczgnmbmxvy3sbtajulqqofdpkhoqckeuygd6q 146694  train_imgs647_576c8a63.kwcoco.zip
bafybeihftuyiorcbfhb5bjvcniu4dw5q4gckoavsrhtm4nbzegml4rzmim 148486  train_imgs647_65ea74f6.kwcoco.zip
bafybeihk7t4jt6pjvlsqd5glb3l5xb5wtrmbvi4nof4u6kw2z2jitteliq 178546  train_imgs760_19315e7e.kwcoco.zip
bafybeifgvlqkp4npr445n5l7yvw2wflmaaiwpmpv7toappku3gl5xoa5me 78      vali.kwcoco.zip
bafybeig7eu5zb54d7cbacd4ydjn2omezkwnczt2gub2rsyxowlaze4g5ui 45578   vali_imgs159_248a33db.kwcoco.zip
bafybeigajvmrj4cpbuumtdrzbzj27hfyqqbnren7sjmrumjcttkisby4oi 45068   vali_imgs159_ed881576.kwcoco.zip
bafybeib7l4mrlqgg6xlkz3n6u4pj37xlkftgqrdu267ifedei24yuz7jgm 28237   vali_imgs84_078e0ebf.kwcoco.zip
bafybeih2onl4ql73cmvqkvpkw6or4bw562hlrgzmztkvnqahf5ioo3xwtm 28432   vali_imgs84_8b1bbddd.kwcoco.zip
bafybeidx2xae2hgdg5djzz4kox2vizytqtwgqogj7r37omwasxdlkvblv4 28277   vali_imgs84_f4c3d117.kwcoco.zip

However, when the external user tried to run the ls command, it ran for over a half an hour with no response before they killed it. That is a big UX problem.

I have verified that they were able to access a single image from the original dataset (which I believe is pinned by more than just me) via:

ipfs get bafybeigueauk5udaoeq3cjedqz4usm4xxcpk4acv5z5rml54sksaiqnd7i -o IMG_20201112_112429442.jpg

But I have not verified if they can access any newer data yet. I’m going to have them try this comment to test that:

ipfs get bafybeifdxozsmyvks3pshj2qujhpbwwkaeu7cd6vvuwphyd5zunjdtezbi -o PXL_20231116_135031922.jpg

I’m likely the only person pinning that data at the moment.

I’m not sure what could cause the ls to hang for 30 minutes like that. Is it just trying to find my node and failing? Is there anything I could do in order to make external access to the data easier?

2 Likes

Hey I’m curious what version of Kubo you are running ?
After looking I can find your node in the DHT:

> ipfs dht findprovs bafybeie275n5f4f64vodekmodnktbnigsvbxktffvy2xxkcfsqxlie4hrm
12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr

However then on https://ipfs-check.on.fleek.co I can’t connect to your node:

I see you only have 1 public IPv4, trying to connect to it manually confirm it is closed:

> nc 172.100.113.212 4001

/quic is an older experimental version of QUIC we don’t support anymore. (/quic-v1 which is based on RFC9000 is what is current)

I don’t know if 12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr is your node (might be someone else) but then your node isn’t properly advertising it’s content in the DHT since it does not show up under findprovs.

Running ipfs --version on my personal node I get: ipfs version 0.20.0. I can certainly upgrade if that is needed. I just ran ipfs-update install latest, and it is currently upgrading to v0.25.0.

For my personal node, my PeerID is: 12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr

The node my employer runs is: ipfs version 0.22.0, and that should also be hosting the data. That has a PeerID of 12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi.

If my node isn’t visible, its strange that I’m able to pin content on my employer’s machine. I would also think if the content is there (and I verified it was via ipfs pin ls --type=recursive | grep bafybeie275n5f4f64vodekmodnktbnigsvbxktffvy2xxkcfsqxlie4hrm), then both machines would need to be misconfigured for external users to be unable to see their content.

This one is unreachable, you should try upgrading to the latest version, it should detect that it is not reachable and setup hole punching.

> ipfs id 12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr
Error: failed to dial: failed to dial 12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr: all dials failed
  * [/ip4/127.0.0.1/udp/4001/quic] QUIC draft-29 has been removed, QUIC (RFC 9000) is accessible with /quic-v1
  * [/ip4/172.100.113.212/udp/4001/quic] QUIC draft-29 has been removed, QUIC (RFC 9000) is accessible with /quic-v1
  * [/ip4/192.168.222.29/udp/4001/quic] QUIC draft-29 has been removed, QUIC (RFC 9000) is accessible with /quic-v1
  * [/ip6/::1/udp/4001/quic] QUIC draft-29 has been removed, QUIC (RFC 9000) is accessible with /quic-v1
  * [/ip4/127.0.0.1/tcp/4001] dial to self attempted
  * [/ip4/127.0.0.1/udp/4001/quic-v1] dial to self attempted
  * [/ip6/::1/tcp/4001] dial to self attempted
  * [/ip6/::1/udp/4001/quic-v1] dial to self attempted
  * [/ip4/172.100.113.212/udp/4001/quic-v1] timeout: no recent network activity
  * [/ip4/192.168.222.29/udp/4001/quic-v1] context deadline exceeded
  * [/ip4/192.168.222.29/tcp/4001] dial tcp4 0.0.0.0:4001->192.168.222.29:4001: i/o timeout
  * [/ip4/172.100.113.212/tcp/4001] dial tcp4 0.0.0.0:4001->172.100.113.212:4001: i/o timeout
> ipfs id 12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi
{
	"ID": "12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
	"PublicKey": "CAESINJURF1we7J9Cz+ixIsi7rw9cNbuOY+umnLzYvxuZ6St",
	"Addresses": [
		"/ip4/127.0.0.1/tcp/4001/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/127.0.0.1/udp/4001/quic-v1/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiAayPbXIiBDmt5jDAYc4E4irZ7PxvchMrx_iDhyZw7Wng/certhash/uEiAwJluXv36zWGzSaTQnUDyN3HE6H707_IXsZuc87b-9Fg/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/127.0.0.1/udp/4001/quic/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/192.168.115.40/tcp/4001/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/192.168.115.40/udp/4001/quic-v1/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/192.168.115.40/udp/4001/quic-v1/webtransport/certhash/uEiAayPbXIiBDmt5jDAYc4E4irZ7PxvchMrx_iDhyZw7Wng/certhash/uEiAwJluXv36zWGzSaTQnUDyN3HE6H707_IXsZuc87b-9Fg/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/192.168.115.40/udp/4001/quic/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/193.223.242.103/udp/4001/quic-v1/p2p/12D3KooWRvV6gMam48mCypcy1Z7ssXnNjb7dx8nx9oLAVsBcda2w/p2p-circuit/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/193.223.242.103/udp/4001/quic-v1/webtransport/certhash/uEiCyBKtjxVNqU1nEnzL2IqooJYVyBhX3PAdntub4abaBcQ/certhash/uEiBGJbK0f27LzKM_AJQp0hn9ODvSQnYpMkwSzHsD9y3eWg/p2p/12D3KooWRvV6gMam48mCypcy1Z7ssXnNjb7dx8nx9oLAVsBcda2w/p2p-circuit/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/193.223.242.103/udp/4001/quic/p2p/12D3KooWRvV6gMam48mCypcy1Z7ssXnNjb7dx8nx9oLAVsBcda2w/p2p-circuit/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/3.129.70.253/udp/4001/quic-v1/p2p/12D3KooWKADHgXr2vAeZ1ZrGePAUQXaCsAk2xEK2dQAd2TdnrVzS/p2p-circuit/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip4/3.129.70.253/udp/4001/quic-v1/webtransport/certhash/uEiBXKTVtTMUT9R06LoNHrb4_uvEQPNutv0mHN7sKSptuTA/certhash/uEiA4LG_xPhA3AsxyZ96Lw9owIdarvgo6kOFKXV6L6__pnA/p2p/12D3KooWKADHgXr2vAeZ1ZrGePAUQXaCsAk2xEK2dQAd2TdnrVzS/p2p-circuit/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip6/::1/tcp/4001/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip6/::1/udp/4001/quic-v1/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip6/::1/udp/4001/quic-v1/webtransport/certhash/uEiAayPbXIiBDmt5jDAYc4E4irZ7PxvchMrx_iDhyZw7Wng/certhash/uEiAwJluXv36zWGzSaTQnUDyN3HE6H707_IXsZuc87b-9Fg/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi",
		"/ip6/::1/udp/4001/quic/p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi"
	],
	"AgentVersion": "",
	"Protocols": null
}

This one is reachable, however I can’t find it in the dht with findprovs, you should check the logs for:

🔔🔔🔔 YOU MAY BE FALLING BEHIND DHT REPROVIDES! 🔔🔔🔔

messages.
If they show up they should tell you to enable AcceleratedDHTClient.

I want to point out that perhaps you are interested in having a look to https://desci.com/, just in case you didn’t know it and applies to what you are doing.

So, I think I figured out why my personal node is unreachable… I had forgotten to forward the 4001 port to it when I reset my router… (whoops :person_facepalming:). I also enabled AcceleratedDHTClient here as well.

As for the other node, I do see the message in the logs. I’ve edited the value in the config from false to true and restarted the daemon (I think that’s all I need to do?).

It does look like ipfs-check now works for CID bafybeie275n5f4f64vodekmodnktbnigsvbxktffvy2xxkcfsqxlie4hrm and Multiaddr: /p2p/12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr, but it doesn’t seem to be working for /p2p/12D3KooWPyQK2JEXnqK1QxiV9Y7bG3UsUQC5iQvDxn8bV1uqvsbi. Perhaps I just need to wait an hour for an AcceleratedDHTClient scan?

@hector, I did not know about desci.com, but it looks like it is very relevant. Thanks for the pointer!

yes

idk, try showing ipfs stats dhtserver on your unreachable node, to see if it can find it’s neighbors in the dht.

I think I have something messed on my local node. At one point I must have had this working because I’ve been able to access data externally before, but something is certainly wrong now.

As a test I created a small file with random data. It’s CID is: QmaRssZfmkya5LX53hoyxHgk4RzTvo9grUCcR412xCva4B.
I pinned this on my local node.

I’ve verified that port 4001 is open by navigating to: Canyouseeme - Check My Port and checking my WAN IP with port 4001.

I double checked sudo ufw status to verify port 4001 is allowed:

4001/tcp                   ALLOW       Anywhere                   # Public IPFS libp2p swarm port
4001/tcp (v6)              ALLOW       Anywhere (v6)              # Public IPFS libp2p swarm port

Do I need to do anything with UDP / QUIC-v1 in my firewall? It seems that I only have TCP configured here.

My router should be forwarding the port to the correct location on the LAN:

(I do have an alternate LAN IP address because the pi has a wifi and ethernet port, but I don’t think that would cause an issue).

Running the test on IPFS Check indicates:
:x: Could not find the multihash in the dht

Relevant sections (let me know if I missed something) of ipfs config show are:

  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": null,
    "AppendAnnounce": null,
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": null,
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {
    "ServiceMode": "disabled"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Identity": {
    "PeerID": "12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr"
  },
  "Routing": {
    "AcceleratedDHTClient": true,
    "Type": "dhtclient"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "1m0s",
      "HighWater": 40,
      "LowWater": 20,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }

Is there anything about that config that looks odd / needs to be changed?

Something annoying is that if I ssh into a different box that is not on my LAN I seem to be able to access the CID. I would think that is a good test for if it is publicly available, but perhaps not. Specifically what I have are 3 machines: A, B, and C.

  • Machine A: This is my workstation. It is on my LAN, but it is also connected to a company VPN.
  • Machine B: This is my IPFS node on my LAN.
  • Machine C: This is the company IPFS node, which is not on my LAN.

On machine A, I connect to the VPN, and ssh into Machine C. Then on Machine C, I tried: ipfs cat QmaRssZfmkya5LX53hoyxHgk4RzTvo9grUCcR412xCva4B, and it worked. I would think that it would be forced to access that data via the WAN address? But maybe because Machine A can see Machine B, and Machine A is connected to Machine C, then Machine C can see Machine B? I don’t quite understand how that would work, but it seems to? It would be nice to have a way to force it to use the WAN so I can test that is working. If anyone has insight into what is going on here, I’m very curious.

Yes AppendAnnounce is empty Configure NAT and port forwarding | IPFS Docs

You can forward UDP too, it helps but just TCP should also work (altho connections will take longer due to the extra roundtrips)

I’m really struggling to debug this. The config now has:

  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": null,
    "AppendAnnounce": [
      "/ip4/172.100.113.212/tcp/4001",
      "/ip4/172.100.113.212/udp/4001/quic",
      "/ip4/172.100.113.212/udp/4001/quic-v1",
      "/ip4/172.100.113.212/udp/4001/quic-v1/webtransport"
    ],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": null,
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },

But I’m still unable to access content on a public gateway.

I’m going through the diagonstics on PL Diagnose but I’m finding that it’s discovering IPFS nodes on my LAN that should not be connected to the WAN at all!

I have a new version of my dataset (bafybeifkufkmmx3qxbvxe5hbskxr4gijkevcryxwp3mys2pqf4yjv2tobu) which I was testing (it exists on all 3 nodes on my LAN, but only one of which should actually be exposed to the WAN), and pl-diagnose returned the ID’s of my internal nodes!

For test 1. It said “Our backend found 2 providers for this CID.”

Providers:ℹ️
Peer: 12D3KooWBwURxUQaBe9s8G4bVMG6hNddptp5jmK1s2ri4r5QTSb4

/p2p/12D3KooWBwURxUQaBe9s8G4bVMG6hNddptp5jmK1s2ri4r5QTSb4
Peer: 12D3KooWJdfXwFfjyobWYWuDe2Tyy88FLRcW9YrTQC18VwBv7znm

/p2p/12D3KooWJdfXwFfjyobWYWuDe2Tyy88FLRcW9YrTQC18VwBv7znm

And neither of those is the public node (which ends with QLsbr).

For test 2. It found the address: /p2p/12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr correctly:

And the right WAN addresses are there:

/ip4/172.100.113.212/udp/4001/quic
/ip4/192.168.222.29/tcp/4001
/ip6/::1/tcp/4001
/ip4/127.0.0.1/tcp/4001
/ip4/172.100.113.212/tcp/4001
/p2p/12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr

Oddly enough test 3 failed when I entered my /p2p address as it said:

but if I follow the instructions in the error message and find an /ip4/... address fro ipfs id (e.g. /ip4/172.100.113.212/tcp/4001/p2p/12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr) it does work:

Despite this, test 4 failed with:

The request failed

context deadline exceeded

But of course if I run:

ipfs ls bafybeifkufkmmx3qxbvxe5hbskxr4gijkevcryxwp3mys2pqf4yjv2tobu

on my node, it works just fine.

There is an unrelated bug with public gateways: bitswap/server: wantlist overflows fails in a toxic maner preventing any data transfer · Issue #527 · ipfs/boxo · GitHub

It now finds your public node.

Here is it working on my node:

> ipfs ls bafybeifkufkmmx3qxbvxe5hbskxr4gijkevcryxwp3mys2pqf4yjv2tobu
bafybeiee5yfznoljl6m5omsm556dkr3b7nxha3lv7gdjbwhqphzs6hm74a -       _cache/
bafybeiewgi3mgwqty32ehljniz6m2hqhjijpquhli2yd4zrek3odlgyuui -       analysis/
bafybeihsbj6jn5visfv4bz2qtg7e6n2nxhpihongvc3kft3ubsagf2dqca -       assets/
bafybeiga6ekq3db7nxf5us5hn3bznod6ykvhebbv7l77hdmrmycoluo7qa 3727541 data.kwcoco.json
bafybeidu3uhiccq4xy2ubpyrolnx7bsbngbs6edswv7z7jlinbaxgtlzle 80      train.kwcoco.zip
bafybeihwwxjhwrdghhl6lo4xvsvfitstk65fqlnm2pinwfiir4yzteaapq 255422  train_imgs1117_58688878.kwcoco.zip
bafybeibxaomuvzwkofoknnc6ad634e3q5fr27ni4our3dx2c5tqt35etny 295279  train_imgs1281_0d45ee36.kwcoco.zip
bafybeif3iczght2i6l4a5kvxqvg5tbchxdhl5ogqgppm6dvve2b3yyqw2m 295280  train_imgs1281_474dc9ee.kwcoco.zip
bafybeifagh5dvowtjlepejnppbjcyfxipt55c36s4i6s7ljbcqkw5euuqm 66229   train_imgs278_27bcbd3e.kwcoco.zip
bafybeifcch6zn4y6a73ougeltsfetlhmzwjxkuhh2h2zk4iqkuo4ihswre 84417   train_imgs346_11d67089.kwcoco.zip
bafybeicwt6iy5crccdpybk3io7teb2innbmt5u3rqqtkkg6347bdc3ck2y 84528   train_imgs346_3e3fc072.kwcoco.zip
bafybeibvesr2rxfjaj6y6mfwyslq5wfifyp6rfrerybtplljwseccf5p5u 91111   train_imgs386_4653bb8f.kwcoco.zip
bafybeifaidhinenbykuei6ptxeedn4b3ypxwp57wyjb6cv4iu5ssdiujeu 105749  train_imgs454_ff3d0b9d.kwcoco.zip
bafybeibubxgytkp67yo5jczgnmbmxvy3sbtajulqqofdpkhoqckeuygd6q 146694  train_imgs647_576c8a63.kwcoco.zip
bafybeihftuyiorcbfhb5bjvcniu4dw5q4gckoavsrhtm4nbzegml4rzmim 148486  train_imgs647_65ea74f6.kwcoco.zip
bafybeihk7t4jt6pjvlsqd5glb3l5xb5wtrmbvi4nof4u6kw2z2jitteliq 178546  train_imgs760_19315e7e.kwcoco.zip
bafybeicichqvckf7ekoauxwm2z7233w66cufxsba2u5sxktluodj7fpgpu 203903  train_imgs890_ee0b0dd8.kwcoco.zip
bafybeiak222vvaudvscrc7o37oijlxnzjpc6ej4n27a35xk2exdbbmebbq 210712  train_imgs912_9f4a3016.kwcoco.zip
bafybeifgvlqkp4npr445n5l7yvw2wflmaaiwpmpv7toappku3gl5xoa5me 78      vali.kwcoco.zip
bafybeibbsja7wq3qmvwg6gqchwmm4tleg5l5gdqttinqh64jc25q6sxc6u 45578   vali_imgs159_248a33db.kwcoco.zip
bafybeigajvmrj4cpbuumtdrzbzj27hfyqqbnren7sjmrumjcttkisby4oi 45068   vali_imgs159_ed881576.kwcoco.zip
bafybeib7l4mrlqgg6xlkz3n6u4pj37xlkftgqrdu267ifedei24yuz7jgm 28237   vali_imgs84_078e0ebf.kwcoco.zip
bafybeih2onl4ql73cmvqkvpkw6or4bw562hlrgzmztkvnqahf5ioo3xwtm 28432   vali_imgs84_8b1bbddd.kwcoco.zip
bafybeidx2xae2hgdg5djzz4kox2vizytqtwgqogj7r37omwasxdlkvblv4 28277   vali_imgs84_f4c3d117.kwcoco.zip

(it can take some time to provide all the CIDs, you can check ipfs stats provide if you want some indications)