I am a cofounder of Qeeebo which is a large question answer website. “AI-curated question-and-answer platform built to become the world’s largest.”
We have a deployed a beta 0.04 site with 27k questions but are looking to deploy our 180+ million question site soon. We have been exploring different options to mitigate server load and help with caching at this scale, hence our running across IPFS.
Would love for the community to chime in here to share how IPFS could help a large and soon to be largest question answer website on the internet. Any ways we can integrate? Are there other large static sites using IPFS? What is the largest?
Chatgpt gave me some ideas that IPFS could be a mirror on a subdomain ie:
ipfs.domain
We would need to host our own IPFS gateway (Kubo) at this scale?
Is there a way to perserve or create normal SEO urls for google or will that not be possible?
User / Googlebot
↓
↓
BunnyCDN (Pull Zone)
↓
Your self-hosted IPFS gateway (HTTP)
↓
IPFS datastore (blocks, CIDs)
No IPFS URLs are ever exposed publicly.
can that work in practice?
How your self-hosted gateway should behave
Your gateway must behave like a plain HTTP origin.
I’d say IPFS and UnixFS are well-suited for hosting large static websites thanks to HTTP Gateways being very browser friendly.
Kubo’s UnixFS implementation automatically applies HAMT-sharding to large directories, so in theory you can make them as big as you need. There’s no hard limit on the number of files.
This is approximately 350 GiB in size with over 20 million HTML files - similar in scale to what you’re planning.
For your specific questions:
Infrastructure: To get you started, your own Kubo node(s) should do the trick to ensure reliable pinning and serving. As you mature beyond prototype stage, explore scaling by putting them behind https://ipfscluster.io
SEO: The key is preventing duplicate content issues across multiple gateways. Add a canonical link element in the <head> of your static HTML pages pointing to your preferred URL:
This is really helpful, I will read through that documentation. This would be the key to ensure our pages are canonical in the end, if we used IPFS even as a backup redundancy we would need to ensure the former.
Our site will be 4-5TB or larger but it is great to see the Wikipedia instance that is quite large!