I’m building a dApp that requires us to scrape NFT metadata over potentially large collections (let’s set a baseline at 10k or so) in as fast a manner as possible (aiming for sub 30s). The assumption is that this metadata is stored on IPFS. While the number of files is large, the total size is only a handful of MBs.
As a first step, I set up a single IPFS node on AWS but found that ipfs gets are prohibitively slow (>3-5 mins for 10k collection)
As a fallback, I plan to scale horizontally (say 5-10 nodes) and place the IPFS nodes behind a load balancer. However, even with this approach I see exponential slow-downs getting into the higher numbers of file retrievals (5k+)
Questions
is there anything glaringly obvious that I’m missing? I’m new to the dev side of IPFS so I wouldn’t be surprised if I’m just doing something entirely wrong.
is there something I can do to make ipfs get faster? some configuration etc
what hardware specs would influence an ipfs gateways performance the most? cpu? memory? network i/o? I’ve played around with a few different types of AWS instance and found similar results.
Any help would be greatly appreciated
Thanks in advance
That’s 18 milliseconds per retrial. That’s pretty fast isn’t it? …considering the network latency and the fact that each call is at minimum one round trip?
for context - I can retrieve 10k metadata files hosted on normal http server in <7s with the same infrastructure - this too is individual file retrievals not a glob