Hey there, I just did a small benchmark and uploaded small JSON object (78 bytes) 1000 times using the js-ipfs-http-client. I then measured the size of the .ipfs folder every 20 uploads.
My results show that the additional space consumed per item increases linearly, starting with about 0.5MB and ending up at more than 2MB per additional upload item. This means that the overall disk usage increase nonlinearly (second-degree polynomial). For example, at 100 uploads the disk usage was 30MB, but at 1000 uploads it was already over 290MB (!). Here is a small graph for visualization: https://i.imgur.com/5bvAb2b.png. I’m using Unix FS for adding the files and I update the node’s IPNS after every update with the hash of the folder containing the new upload.
Can someone explain the reason for this? Thanks in advance!
I think I figured out the issue. Pinning the updated folder after every update is causing lots of duplicate files to be retained. Will need to run this test again.
Badger datastore? It has a large overhead.
No, I’m using the default flatfs data store.
The issue was that I was pinning the updated root folder of the MFS after every upload, which created lots of duplicated copies of the folder. That explains the exponential scaling.
To obtain linear scaling and minimal storage use I removed all pinning, since the garbage collector doesn’t delete files in MFS anyway. I use rawLeaves=true on writes and run garbage collection before every disk size measurement. That results in linear storage growth
I’m interested in what that overhead is, can you add some more details to characterize it?
Badger is optimized for fast ingestion and queries. Depending on usage and settings, it can have significant disk overhead. By default badger does not delete and garbage collection needs to run explicitally. Indexes for fast queries also need to be backed to disk I think.
Oh wow. Thanks that’s really good information. I was interested in taking a look at badger and I’m glad I know this going into it.