Hi there, I’ve been having issues with adding large amounts of data to the kubo data store. So far kubo has been a beast in processing almost millions of CIDs a day, but in terms of short-term usage, I’ve noticed sometimes things can start lagging out and slowing down to the point that the data processors start timing out on their network requests to my kubo API. This happens mostly when I run multiple data processing pipelines, although with only a single on the issue doesn’t seem to crop up as much.
I was wondering what the limits are on kubo processing speed, or how I can debug when it starts to slow down?
I was also thinking a potential solution was to have multiple kubo daemons running on a single machine (it has plenty of RAM and CPU to handle this), with one dedicated to each data processing job. How would I go about doing this?
Do you mean advertising them to the DHT or simply adding them to the kubo data store?
So you are saying that processing millions of CIDs a day wasn’t a problem until recently? Do you know starting at which kubo version it became a problem?
Since it is probably a regression that should be solved soon, I would rather downgrade kubo to a version that didn’t have this problem until the fix is released.
Thanks, that’s great news about the data onboarding improvement! I don’t think it’s necessarily a regression with a new kubo version itself, it’s just that the RPC API is getting hit with more requests at once because of better concurrency in the calling python script.
I was just hoping to find out whether the team, has knowledge of any concurrent request limits in kubo, with the pebbleds profile, that significantly slow down responses.