Hi all,
I spent the last weeks on assessing IPFS for the exchange of large data (> 200GB) on a private swarm. The main goal is to make these large files available to a small cluster of machines. I managed to set the swarm up and share the data over IPNS. However, the access times were somewhat behind my expectations. I used kubo 0.19.0
The setup I used was two machines (A & B) connected by 1GBit network. I added the file/directory (300 GB) on machine A to the IPFS published it through IPNS and accessed the respective key on machine B.
As a baseline to my measurements I used a direct SCP transfer of the file/directory in question. I did several measurements to average out differences in the repeated transfers but the variance in timings was very small. I also used different chunk-sizes for IPFS’s block storage.
My goal is to access the contents of the file/directory form other software that is not aware of IPFSs API, so I’m a bit stuck with “ipfs get” as I never got to mount IPFS in a reliable way to access it just as any other filesystem. Maybe, I did not understand all of the options IPFS gives me here. If you have some advice on how to do it better, please tell me.
Here are my findings in comparison to the scp baseline
- Accessing (ipfs) the file on machine B took 2.79 times the time of scp (6h 51m vs 2h 27m) with 256kB blocks and 1.55 times the time (3h 48m vs 2h 27m) for 1MB blocks.
- I assumed most of the time was spent in transferring the hashed blocks so I called the “ipfs get” command a second time: 256kB blocks: still 0.75 time of the scp (1h 51m vs 2h 27m) and 1MB 0.72 times the scp time (1h 47m vs 2h 27m).
- I found out that one can use “ipfs pin add” to transfer the MFS from machine A to machine B but this took almost the same time as scp (2h 26m vs 2h 27m) and I would need to call get after that to access the contents of the directory leaving me still with additional 1h 50m.
Is there something I could do more effective here? How would I make IPFS work efficiently with large files?
Under the assumption that my actions are the expected way to interact with IPFS in a private swarm, I would assume that accessing the data after caching (the second time) would need to be much faster. I understand that maintaining the Merkle-dag doesn’t come for free but I would expect that recreating the once downloaded data would rather take minutes not hours. Is there something I forgot to configure?
Best and thank you.