Performance: Batch requests and gc control

Hi,
thank you for working on IPFS. It is fun to use it.

Currently, I work on a project which puts many (small) objects on IPFS in order to create a distributed knowledge representation system. The layer above IPFS is written in Java. Therefore I use the IPFS Java API to interact with the IPFS node.

When I put and pin data, I have somehow (time) performance problems due to too many Http API requests. So I want to reduce the amount of API calls, now.

Do you think it might help to reactivate batch put calls? Meaning that you can put many JSON objects on IPFS via a single API call?
I know that this was once removed because it had not been used frequently but I think it could be useful in my case.

Was a batch put call faster than putting every IPFS object each? I think about a possible overhead due to many API communications instead of a single one.

(Btw in Java IPFS API there is still a batch put method implemented, even if it is not supported anymore. This is misleading, since it only puts the first object of the list on IPFS.)

If you have any other idea how I can increase the performance of putting objects on IPFS, then please let me know.

My second question is about garbage collection:
Is it still the case that the gc has to be called manually? Or is there already a mechanism which triggers the gc? Knowing that could reduce the amount of pinning calls I currently make (after a new object has been added, I directly pin it such that an automatic gc does not delete it right away).

Thank you!

I’m not opposed to having batch put calls, that could be a pretty useful feature (especially if we wire it up to datastore batches internally). If you want to move forward on getting that implemented, please go ahead and file a feature request issue on the go-ipfs repo.

That said, I wonder if you can get improved performance by reusing the http connection between requests with keep-alive. I’m not sure if the java api client does this or not. I pinged another community member who is working with the java api, hopefully he can comment.

As for GC, it is currently only run manually by default. You can run the daemon with --enable-gc to run automatic GC’s, but the default behaviour requires the user to run ipfs repo gc to clear out unpinned objects.

Hi dohues,

I’m the author of the Java ipfs api. You’re right, that is very misleading that the batch-put only adds the first element. When I implemented that I thought that ipfs already had batch puts, but it turned out they didn’t. I think ipfs will be implementing it soon though.

I use the Java api in Peergos and we do 60 concurrent puts for 128 KiB fragments when uploading a file, which has no issues saturating my internet connection. This seems to work fine. How many concurrent puts are you doing? Is the slowness covering the put and pin or just the put?