This question comes up a lot and I don’t know very well what to say.
Using the Badger datastore I know that a single IPFS daemon can store data at least amounting a Terabyte without effort because I’m doing it myself. My questions are more:
- Is there anyone storing, say, more than 10 TB per node?
- What are the current issues that one can encounter with Badger DB? I have personally not seen many.
- What are the constraints imposed by the go-ipfs peers? Not just storing but handling it: I suppose doing a repo-format upgrade in 10TB of data will take a while at least. Advertising millions of keys to the DHT should also become problematic at some point (memory footprint? badwidth constraints?).
Dirty calculation: 1M keys at 256KiB per block (default) gives me around 250GiB of data. Badger is supposed to handle millions of keys. So 10TiB would mean around 40 Million keys.
FWIW I have experienced multiple corrupted badger repositories on multiple nodes (some more than once): a Windows 10 desktop (NTFS), a linux laptop (EXT4), and a linux server (ZFS). When this happens, the tools available at the time to attempt to repair the badger db (I don’t know if that has changed) don’t seem to currently handle all failure scenarios – or don’t always work for whatever reason; or at least I wasn’t able able to get them to work.
Another issue I’ve experienced is that GC doesn’t seem to reliably reclaim disk space when using badgerdb.
Can you describe more about your setup? I started noticing some undesirable memory utilization and performance issues around 500 GB. However, I also use this server for a bunch of other stuff so it didn’t have free reign of all resources.
Memory usage. At least given how we’re currently using it, badger allocates a bunch of memory up-front.
There’s also https://github.com/ipfs/go-ds-badger/issues/18 but I haven’t tested this in a while.
The IPFS Cluster for the community runs ipfs peers with 40TB, out of which 1.2T are used. But they also run with 64GB of RAM. And yes, it seems IPFS allocates a lot of that memory. But performance-wise we noticed a big improvement when we migrated from the default datastore (even though it took days).
Does this happen after a crash or just randomly?
It’s possible the ipfs daemon crashed and I didn’t notice it, or it didn’t close down gracefully during a restart. But the machines themselves didn’t crash. I just eventually could no longer start the daemon due to badger-related errors; it seemed random to me in each instance because nothing special happened to my knowledge in most cases.