IPFS Storage Consumption increases nonlinearly

sigma67 · July 21, 2020, 6:07pm

Hey there, I just did a small benchmark and uploaded small JSON object (78 bytes) 1000 times using the js-ipfs-http-client. I then measured the size of the .ipfs folder every 20 uploads.

My results show that the additional space consumed per item increases linearly, starting with about 0.5MB and ending up at more than 2MB per additional upload item. This means that the overall disk usage increase nonlinearly (second-degree polynomial). For example, at 100 uploads the disk usage was 30MB, but at 1000 uploads it was already over 290MB (!). Here is a small graph for visualization: https://i.imgur.com/5bvAb2b.png. I’m using Unix FS for adding the files and I update the node’s IPNS after every update with the hash of the folder containing the new upload.

Can someone explain the reason for this? Thanks in advance!

sigma67 · July 21, 2020, 9:08pm

I think I figured out the issue. Pinning the updated folder after every update is causing lots of duplicate files to be retained. Will need to run this test again.

hector · July 22, 2020, 7:28am

Badger datastore? It has a large overhead.

sigma67 · July 22, 2020, 10:10am

No, I’m using the default flatfs data store.

The issue was that I was pinning the updated root folder of the MFS after every upload, which created lots of duplicated copies of the folder. That explains the exponential scaling.

To obtain linear scaling and minimal storage use I removed all pinning, since the garbage collector doesn’t delete files in MFS anyway. I use rawLeaves=true on writes and run garbage collection before every disk size measurement. That results in linear storage growth

zacharywhitley · July 30, 2020, 1:57pm

I’m interested in what that overhead is, can you add some more details to characterize it?

hector · July 30, 2020, 8:55pm

Badger is optimized for fast ingestion and queries. Depending on usage and settings, it can have significant disk overhead. By default badger does not delete and garbage collection needs to run explicitally. Indexes for fast queries also need to be backed to disk I think.

zacharywhitley · July 31, 2020, 2:25pm

Oh wow. Thanks that’s really good information. I was interested in taking a look at badger and I’m glad I know this going into it.

Topic		Replies	Views
It becomes very slow after add millions of files Help go-ipfs	4	801	July 11, 2018
Disk space consumption in IPFS Kubo	7	1477	June 24, 2022
IPFS storage issue Help	6	988	May 28, 2018
Pinning is very slow with js-datastore-s3 js-ipfs js-ipfs	0	402	January 25, 2022
Adding folder with 49999 files and +- 500 mb stuck Help	9	208	July 1, 2023

IPFS Storage Consumption increases nonlinearly

Related topics