This question is about IPFS, but specifically in the context of go-ds-crdt.
I’m testing go-ds-crdt with a relatively large number of key-value pairs (about 100k), running it with a badger datastore and ipfs-lite, very similarly to the globaldb example shown in the go-ds-crdt repository.
I’m using 2 peers for this test.
However, I’ve noticed that, periodically, I get a flood of warnings like this:
go-libp2p-kad-dht@v0.30.2/lookup.go:42 network size estimator track peers: expected bucket size number of peers
After some investigation, my current understanding is that the repeated calls to that function originate from a reprovider mechanism in IPFS. So:
It looks like the DHT keeps provider records mapping CIDs to the peers that store them, but those provider records have an expiration time (such as 24 hours), and peers periodically re-announce all CIDs that they store (e.g. every 12 hours).
The reprovider mechanism calls the GetClosestPeers function, which generates that warning.
So, my questions are:
Am I correct in understanding that all CIDs are reannounced periodically? Couldn’t this get expensive in terms of traffic, in a scenario with millions of keys and peers hosted in different geographical regions, for example?
Why do I get a flood of these warnings? Does this mean that I should have at least “bucket size” peers for go-ds-crdt to be effective?
The warning is probably triggered because there are only 2 peers in your network IIUC. The warning should be silenced in this case, here is the fix.
For the time being you can simply ignore these warnings, they are not important, especially if you don’t use optimistic provide.
Yes, CIDs are readvertised every 22h. They have to be readvertised periodically, since records expire after 48h, and peers come and go in the DHT. Periodical advertisement helps ensure that at least one of the closest DHT servers still has the provider record (pointer to who has the CID).
You can find more information on why it is important to reprovide content periodically, and how the numbers were defined in this report.
I would like to add a follow-up question, please: is there any alternative that allows for a more efficient reannouncement method? I’ve been looking at IPNI, but it looks like this wouldn’t replace the DHT reannouncement mechanism. Is that correct?
The question is how many peers the cluster expects to have, because it it’s small enough that everyone is connected to everyone then probably it doesn’t even need DHT-providing. Blocks will be found over bitswap if the peer that needs them is connected to a peer that has them.
If there is too much to provide and you depend on DHT discovery, you can use a custom reprovider that only provides, for example, the last 100 CRDT-DAG entries etc.