How to Optimize IPFS Node Performance for Large Data Sets?

Hey guys… :wave:

Recently, however, I have started working with larger datasets — files ranging from several gigabytes to even terabytes — and I’ve noticed that the performance of my IPFS node has significantly slowed down. I’m aware that IPFS isn’t necessarily designed for high-speed data transfer like some other protocols, but I’m hoping to find some ways to optimize my setup and get better performance when dealing with these larger files.

Here is some issues that was I faced:

  1. Even on a fast network, it seems to take much longer than expected to add files to IPFS and to retrieve them. Is there a specific configuration or hardware setup that can help speed this up?

  2. My node sometimes uses a lot of CPU and memory when processing large datasets. Are there any best practices for managing resource usage, or certain settings that can be adjusted to reduce the load?

  3. I’ve also noticed that my node occasionally drops connections or struggles to maintain a consistent number of peers. Are there ways to improve the reliability of these connections? Would changing the network settings help?

I’m running my IPFS node on a relatively standard server setup (16GB RAM, 8-core CPU) with plenty of storage space. The server runs Ubuntu, and I’ve tried a few tweaks, like increasing the cache size and adjusting the swarm settings, but I haven’t seen a significant improvement.

I also check this: https://discuss.ipfs.tech/t/ipfs-files-rm-not-accepting-options-objectmendix But I have not found any solution. Could anyone guide me about this?

Your help will be grateful for me!

Thanks in advance

Respected community member! :smiling_face_with_three_hearts:

1 Like

Hey Nisha,

What you’re describing is a relatively common problem and could be due to a number of reasons:

  • Providing to the DHT is exhausting resources in order to announce every CID in your node to the DHT.
  • Data access patterns are causing disk io saturation

More input so we can help you

  • Which implementation and version of IPFS are you using?
  • Can you share your config (if you are using Kubo/IPFS Desktp)
  • Are you getting any errors in the logs?
  • What disk are you using? SSD or spinning HD?
  • If you’re running Kubo/IPFS Desktop, can you run the ipfs stats provide command and share the output

A couple of suggestions to try out

  • Use the Accelerated DHT Client this comes at the cost of CPU/memory resources to update the routing table.
  • Change the Reprovider Strategy from all to roots which will reduce the number of CIDs to advertise.

Generally, SSDs tend to perform better with random access which is common with pretty common with IPFS.

What exactly do you mean by “processing large datasets”? Do you mean adding those datasets to the IPFS Node?

It’s hard to generalise, but it’s common in a peer-to-peer network for peers to regularly come and go.