Maximum number of pins supported by kubo

ligustah · July 15, 2024, 9:18am

Hello,

I’ve been experimenting with pinning a large number of objects. I was wondering if there are any limits on how many objects can feasibly be pinned to a single kubo node?

In my tests I found that the pin/ls API returns about 2.5 million pins before the connection breaks, so that would be an issue. Let’s say I wanted to support 10 million pins, what other issues might I be running into?

For the sake of discussion, assume that I have unlimited storage, so I’m only concerned about the active management. In particular, I’m trying to understand if there are any internal processes that scale linearly with the number of pins. One such candidate I already found is the garbage collection, but are there other (important?) processes I need to be mindful of?

I’m guessing providing would also be impacted by this. How many pins can a node on a fast network with accelerated DHT (re-)provide per second?

Also, I know that ipfs-cluster is a thing, but even then I’d be curious to understand how many nodes I would need to support X million pins.

Thanks a lot for your insights/experiences.

bumblefudge · July 16, 2024, 7:27am

Couple questions:

when you say “accelerated DHT” do you mean a private DHT?
have you looked at rainbow versus kubo? or do you need the node routing requests to be [near] the node hosting/pinning the PINs?
would IPNI be overkill? if the nodes are relatively stable and long-lived, it might make more sense to use an indexer model for 2.5million/day…

ligustah · July 16, 2024, 1:08pm

I mean the accelerated DHT client in kubo, which keeps a full routing table: kubo/docs/config.md at master · ipfs/kubo · GitHub

My (limited) understanding is that rainbow is acting purely as an HTTP gateway. I need the files to be advertised on the DHT to be accessible through bitswap.

That’s very likely another avenue we are exploring, but we’d like the files to be discoverable through DHT as well.

hector · July 16, 2024, 9:44pm

We had a 100M pins deployment with 24 ipfs-cluster peers. In general, it all depends on how fast your disk is. I’m not sure why connection to Kubo would break at 2.5M pins, but yeah, you need to test as 1) Your hardware: particularly disks 2) Your configuration 3) The amount of retrievals/traffic 4) The amount of writes… all affects the final number of what becomes “too much” for a single Kubo-box to handle.

ligustah · July 17, 2024, 9:21am

Thanks, that’s some very helpful data. I think my main concern was around how many (re)provides a single kubo can feasibly handle. In your example, at 4M per node, that would be roughly 50 per second per node for 22h (assuming you only announce direct/recursive pins), which sounds good!

hector · July 18, 2024, 4:28am

Right, you can also not provide to DHT and use IPNI (cid.contact). You are correct that only reproviding to DHT is a source of high cpu, bandwidth and disk-usage at that level. I am not sure if our nodes were managing that correctly.

ligustah · July 24, 2024, 6:55am

I see, makes sense. Thank you! While I have you here, I have a somewhat related question: is it enough to just announce the pins/roots ? Will other nodes in the network know to ask for the remaining blocks over bitswap directly or will I always need to advertise all individual blocks?

hector · July 24, 2024, 7:04am

In most cases doing just roots is fine.

Topic		Replies	Views
Speeding up discovery and loading times to our nodes Help	4	177	July 5, 2024
Pinning in ipfscluster Help ipfs-cluster , ipns	3	280	September 9, 2022
Ipfs and cluster replication Help	2	211	September 8, 2022
Struggling to Pin an IPNS Path Using Kubo Kubo	1	94	January 7, 2025
Gateway: how to serve only pinned CIDs Help kubo	5	77	August 25, 2024

Maximum number of pins supported by kubo

Related topics