Serving large(-ish) documents and directories via public web gateways

I have uploaded documents from UCSF’s AIDS History Project into IPFS. While I want to preserve/share the documents via IPFS, I also want people who don’t use/have IPFS to be able to access them.

The root directory CID is bafybeiclo7qonqfxpivkax64oohm7zp6qexe7wa3ggyjxkm7hjyioiu6w4 and people can browse it just fine at (for example) https://bafybeiclo7qonqfxpivkax64oohm7zp6qexe7wa3ggyjxkm7hjyioiu6w4.ipfs.dweb.link/.

However, the directory “NoMoreSilence_Documents” has over 11K files in it. So browsing to it at doesn’t always work so well, but sometimes works just fine.

Then, in that directory, there are large PDFs, and it’s super-spotty. For example https://ipfs.io/ipfs/bafybeiclo7qonqfxpivkax64oohm7zp6qexe7wa3ggyjxkm7hjyioiu6w4/NoMoreSilence_Documents/GLC45_001_002.pdf is a 300Mb PDF that seems to take a while before getting a 504 Gateway Timeout. (And if that one works for you, pick one or two other ones and it will likely fail.)

It’s not clear to me how to address this, or if I’m just Doing It Wrong, or what. Thoughts I’ve had:

  • Try to get a cluster set up and ask people to join it
  • Mess around with trickle vs. balanced
  • Use ipfs tar add to upload a single tar file that people can download instead of browsing the collection (will that even work with the public web gateway?)
  • Let people know about the collection and hope one of them figures it out or just having more people using it/sharing it magically fixes it
  • Don’t use IPFS for this and/or try again in a year
  • Put my IPFS node on a fast server/network somewhere instead of on my laptop on my home WiFi
  • Ask in this forum to see if anyone has suggestions for things to try or ways to better troubleshoot what the problem is

The main thing is that big archives require big resources and can’t always rely on the public gateways.

1 Like

I don’t know why I didn’t think of this before, but I don’t have to preserve the directory structure as it is, so one thing that might be good to do is to break that big directory up into 8 or 10 smaller directories. (Unless that doesn’t actually fix anything because a directory’s contents is not just it’s immediate contents but all the subdirectories and their contents as well?)

Or you can just enable sharding and let that happen transparently

1 Like