We have several ipfs repos, with ~90GB, ~170GB and ~220GB of content in each - the majority of this data being explicitly pinned to support our application.
Recently pin operations have slowed down and now never complete at all - disk space is not an issue as these repos are placed on a 2TB EBS volume, and each has a configured IPFS StorageMax of at least 500GB.
Is there a way to avoid this sort of performance degradation with large pinsets?
Is there a maximum recommended storage limit for a single IPFS repo, after which this is expected behavior?
It might be noteworthy that these nodes also actively serve content from their pinsets explicitly.
I don’t know the latest on pin performance (I think it’s an open issue), but I think the number of pins might be more relevant than the size of the repository. How many pins do you have?
This might take a while if you have a lot of pins:
so on one of the nodes we have 266,600 pins and repo size 225G, another 273,993 pins and 91G.
thanks for the link! we can definitely add a parallel cache but it would be great to see a native solution in the future.
are there any other potential configurations we can optimize for such large pinsets?
is it recommended we use more nodes (maybe cluster) in order to support this type of workload?
If the requirement is that content be pinned, then I don’t see any way around having large numbers of pins.
I would expect using the MFS to keep track of CIDs to potentially be more performant, but I’m not 100% sure how that would play with GC and your project requirements.
It’s possible that cluster might perform differently since I think it manages pins differently; but I’m not familiar enough with it to know if it scales better or worse with large pinsets.
I’ll have to defer to someone else with more experience for practical workarounds.