Limits on adding large amounts of data to IPFS, is running multiple daemons a possible solution?

Hi there, I’ve been having issues with adding large amounts of data to the kubo data store. So far kubo has been a beast in processing almost millions of CIDs a day, but in terms of short-term usage, I’ve noticed sometimes things can start lagging out and slowing down to the point that the data processors start timing out on their network requests to my kubo API. This happens mostly when I run multiple data processing pipelines, although with only a single on the issue doesn’t seem to crop up as much.

I was wondering what the limits are on kubo processing speed, or how I can debug when it starts to slow down?

I was also thinking a potential solution was to have multiple kubo daemons running on a single machine (it has plenty of RAM and CPU to handle this), with one dedicated to each data processing job. How would I go about doing this?

1 Like

Do you mean advertising them to the DHT or simply adding them to the kubo data store?

So you are saying that processing millions of CIDs a day wasn’t a problem until recently? Do you know starting at which kubo version it became a problem?

A major improvement to data onboarding was recently shipped in boxo but hasn’t made it to a kubo release yet. Hopefully, this improvement along with the provider regression fix should solve your issue in the next release.

Since it is probably a regression that should be solved soon, I would rather downgrade kubo to a version that didn’t have this problem until the fix is released.