Comparing throughput across multiple devices running go-ipfs

Greetings, all!

My team is working on a project that involves the storage, retrieval and indexing of large volumes of data. I have been advised to check out IPFS and determine how it could help with our use case. Please bear with me as I am new to IPFS and I would appreciate any constructive feedback you may have on this.

What we want to do

Our team needs to empirically measure the variations in data transmission speeds across different devices. We have some large files to test in varied sizes (1TB, 10TB, 100TB, 1PB+, etc.).

To begin, we want to test uploading, downloading and indexing of these files on different machines in order to discover which devices are more or less efficient than others.

For example, we would run go-ipfs on an individual machine, (whether it is an AWS EC2 instance, local machine, Raspberry Pi, data center server computer, managed switch, etc.), and test the IPFS data throughput on each device.

Obviously each of the machines will have different bandwidth and network configurations, (ie. both gigabit ethernet and wifi network connections), but we want to understand how the type of machine and its specifications may affect data transmission / indexing.

Some key questions

Are there already existing reports on these comparisons?

Do automated scripts already exist to perform these types of throughput tests?

Are there specific tools or methods that the IPFS community recommends to perform these tests?

Is bandwidth the main factor that affects IPFS data management, or do machine specs like CPU/GPU/RAM play a significant role as well?

What are the key factors that affect IPFS data transmission efficiency?

Is go-ipfs the most effective, or would it be worth testing something like js-ipfs as well?

Machine specs definitely play a role (but GPU doesn’t matter). Storage performance, CPU, and RAM are probably the most important.

I’m not sure about this one, but there are likely still some inefficiencies that haven’t been fixed yet. See this issue: Duplicate Data increases with the number of nodes serving the file. · Issue #4588 · ipfs/kubo · GitHub

I’m biased since I’ve only used go-ipfs, but I’d expect go-ipfs to be a better choice for this.

1 Like