Does IPFS work with extremely large static html sites?

I am a cofounder of Qeeebo which is a large question answer website. “AI-curated question-and-answer platform built to become the world’s largest.”

We have a deployed a beta 0.04 site with 27k questions but are looking to deploy our 180+ million question site soon. We have been exploring different options to mitigate server load and help with caching at this scale, hence our running across IPFS.

Would love for the community to chime in here to share how IPFS could help a large and soon to be largest question answer website on the internet. Any ways we can integrate? Are there other large static sites using IPFS? What is the largest?

  • Chatgpt gave me some ideas that IPFS could be a mirror on a subdomain ie:
    ipfs.domain
  • We would need to host our own IPFS gateway (Kubo) at this scale?

Is there a way to perserve or create normal SEO urls for google or will that not be possible?

User / Googlebot


BunnyCDN (Pull Zone)

Your self-hosted IPFS gateway (HTTP)

IPFS datastore (blocks, CIDs)
No IPFS URLs are ever exposed publicly.

can that work in practice?

How your self-hosted gateway should behave

Your gateway must behave like a plain HTTP origin.

Example

Gateway origin URL (private, never public):

http://ipfs-gateway.internal/

Gateway resolves:

/q/aa/slug.html
/t/a/topic/
/

Internally:

  • Gateway resolves DNSLink → CID

  • Fetches content from IPFS

  • Serves it as normal HTTP

Externally:

  • Looks identical to Nginx serving static files

I’d say IPFS and UnixFS are well-suited for hosting large static websites thanks to HTTP Gateways being very browser friendly.

Kubo’s UnixFS implementation automatically applies HAMT-sharding to large directories, so in theory you can make them as big as you need. There’s no hard limit on the number of files.

Real-world example: An old static snapshot of English Wikipedia is hosted on IPFS at: https://dweb.link/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze

This is approximately 350 GiB in size with over 20 million HTML files - similar in scale to what you’re planning.

For your specific questions:

2 Likes

This is really helpful, I will read through that documentation. This would be the key to ensure our pages are canonical in the end, if we used IPFS even as a backup redundancy we would need to ensure the former.

Our site will be 4-5TB or larger but it is great to see the Wikipedia instance that is quite large!