Understanding reprovider behavior with go-ds-crdt

pcs · April 29, 2025, 11:51am

Hi,

This question is about IPFS, but specifically in the context of go-ds-crdt.

I’m testing go-ds-crdt with a relatively large number of key-value pairs (about 100k), running it with a badger datastore and ipfs-lite, very similarly to the globaldb example shown in the go-ds-crdt repository.

I’m using 2 peers for this test.

However, I’ve noticed that, periodically, I get a flood of warnings like this:

go-libp2p-kad-dht@v0.30.2/lookup.go:42  network size estimator track peers: expected bucket size number of peers

I found that this warning comes from a GetClosestPeers function: go-libp2p-kad-dht/lookup.go at 7cc3df8c667b04b2c15e57b49ba4937a05c747f4 · libp2p/go-libp2p-kad-dht · GitHub

After some investigation, my current understanding is that the repeated calls to that function originate from a reprovider mechanism in IPFS. So:

It looks like the DHT keeps provider records mapping CIDs to the peers that store them, but those provider records have an expiration time (such as 24 hours), and peers periodically re-announce all CIDs that they store (e.g. every 12 hours).
The reprovider mechanism calls the GetClosestPeers function, which generates that warning.

So, my questions are:

Am I correct in understanding that all CIDs are reannounced periodically? Couldn’t this get expensive in terms of traffic, in a scenario with millions of keys and peers hosted in different geographical regions, for example?
Why do I get a flood of these warnings? Does this mean that I should have at least “bucket size” peers for go-ds-crdt to be effective?

Thank you.

guissou · April 29, 2025, 2:00pm

Thanks @pcs for the question.

The warning is probably triggered because there are only 2 peers in your network IIUC. The warning should be silenced in this case, here is the fix.

For the time being you can simply ignore these warnings, they are not important, especially if you don’t use optimistic provide.

Yes, CIDs are readvertised every 22h. They have to be readvertised periodically, since records expire after 48h, and peers come and go in the DHT. Periodical advertisement helps ensure that at least one of the closest DHT servers still has the provider record (pointer to who has the CID).

You can find more information on why it is important to reprovide content periodically, and how the numbers were defined in this report.

You can ignore this one

pcs · May 2, 2025, 10:58am

I would like to add a follow-up question, please: is there any alternative that allows for a more efficient reannouncement method? I’ve been looking at IPNI, but it looks like this wouldn’t replace the DHT reannouncement mechanism. Is that correct?

hector · May 8, 2025, 9:02am

ipfs-lite uses the default reprovider which announces things one by one.

You may want to customize that part (ipfs-lite/ipfs.go at master · hsanjuan/ipfs-lite · GitHub).

The question is how many peers the cluster expects to have, because it it’s small enough that everyone is connected to everyone then probably it doesn’t even need DHT-providing. Blocks will be found over bitswap if the peer that needs them is connected to a peer that has them.

If there is too much to provide and you depend on DHT discovery, you can use a custom reprovider that only provides, for example, the last 100 CRDT-DAG entries etc.

Topic		Replies	Views
Go-ds-crdt: A distributed key-value store implementation for IPFS Ecosystem and Usage go-ipfs , go	6	3117	October 5, 2022
"No peers - broadcasting" log lines Help	0	10	May 7, 2025
Reprovider 28m per key go-ipfs , dht	5	128	May 29, 2024
How often should I re-run ipfs dht provide _____ to keep files available? Help go-ipfs , dht	1	752	November 27, 2018
Does IPFS pinning announces the top-level CID to the DHT or all its links too? Kubo go-ipfs	2	460	February 1, 2021

Understanding reprovider behavior with go-ds-crdt

Related topics