What are the limits to repository size in go-ipfs?

hector · January 3, 2019, 12:00pm

This question comes up a lot and I don’t know very well what to say.

Using the Badger datastore I know that a single IPFS daemon can store data at least amounting a Terabyte without effort because I’m doing it myself. My questions are more:

Is there anyone storing, say, more than 10 TB per node?
What are the current issues that one can encounter with Badger DB? I have personally not seen many.
What are the constraints imposed by the go-ipfs peers? Not just storing but handling it: I suppose doing a repo-format upgrade in 10TB of data will take a while at least. Advertising millions of keys to the DHT should also become problematic at some point (memory footprint? badwidth constraints?).

Dirty calculation: 1M keys at 256KiB per block (default) gives me around 250GiB of data. Badger is supposed to handle millions of keys. So 10TiB would mean around 40 Million keys.

leerspace · January 9, 2019, 1:09pm

FWIW I have experienced multiple corrupted badger repositories on multiple nodes (some more than once): a Windows 10 desktop (NTFS), a linux laptop (EXT4), and a linux server (ZFS). When this happens, the tools available at the time to attempt to repair the badger db (I don’t know if that has changed) don’t seem to currently handle all failure scenarios – or don’t always work for whatever reason; or at least I wasn’t able able to get them to work.

Another issue I’ve experienced is that GC doesn’t seem to reliably reclaim disk space when using badgerdb.

Can you describe more about your setup? I started noticing some undesirable memory utilization and performance issues around 500 GB. However, I also use this server for a bunch of other stuff so it didn’t have free reign of all resources.

stebalien · January 9, 2019, 3:21pm

Memory usage. At least given how we’re currently using it, badger allocates a bunch of memory up-front.

There’s also Something is locking the memory mappings · Issue #18 · ipfs/go-ds-badger · GitHub but I haven’t tested this in a while.

hector · January 9, 2019, 3:37pm

The IPFS Cluster for the community runs ipfs peers with 40TB, out of which 1.2T are used. But they also run with 64GB of RAM. And yes, it seems IPFS allocates a lot of that memory. But performance-wise we noticed a big improvement when we migrated from the default datastore (even though it took days).

Does this happen after a crash or just randomly?

leerspace · January 9, 2019, 5:00pm

It’s possible the ipfs daemon crashed and I didn’t notice it, or it didn’t close down gracefully during a restart. But the machines themselves didn’t crash. I just eventually could no longer start the daemon due to badger-related errors; it seemed random to me in each instance because nothing special happened to my knowledge in most cases.

Topic		Replies	Views
Go-ipfs badgerds memory usage and migrating back to file-based store Kubo go-ipfs	3	857	March 27, 2019
Help: ipfs-cluster is filling my disk Help ipfs-cluster	8	648	July 20, 2021
Badgerds IPFS took all the memory Help	0	145	March 11, 2024
How big can the badgerdb get? [kubernetes planning] Help kubernetes , ipfs-cluster	1	972	August 14, 2020
IPFS for files over 100GB Help go-ipfs , ipfs-cluster	2	627	September 18, 2021

What are the limits to repository size in go-ipfs?

Related topics