Content routing, multiple destinations


When an IPFS client requests a file that is stored remotely at multiple peers, how does the local IPFS instance decide which remote peer to query the file from ?

I am currently running a private network of 15 nodes each one running one IPFS instance with multiple IPFS-Cluster instances running on top of each IPFS instance. When adding a file at a node, I use IPFS-Cluster clients to add it to each cluster instance the node is member of. When reading (Cat) a file, I would like to be able to read it from all IPFS-Clusters the node is member of, but am not sure on how to do it. I currently read with the following function, but I am not sure how the Cat() selects the remote peer.

func readFile(c client.Client, filename string) time.Duration {
	ctx := context.Background()

	sh := c.IPFS(ctx)
	start := time.Now()
	_, err := sh.Cat(filename)
	t := time.Now()
	return t.Sub(start)

In a 15 node network, you’ll likely be connected to all nodes all the time. When requesting a file, you’ll:

  1. Ask all connected nodes for the first piece (root node) of the file.
  2. After that, you’ll create what we call a “session” and put all peers that responded in step one into the session.
  3. Finally, you’ll ask peers in the session for subsequent blocks.

You shouldn’t need to explicitly ask your cluster. If you want to be extra safe, you can explicitly connect to all peers in your cluster before starting to download the file.

Thank you for your answer.

How can you say that it is likely that nodes would be fully connected ? Do you have a link to the documentation ?

The operation would take ~2 x RTT to the furthest node to complete (assuming a small file)?

So in a case where multiple nodes pinning the same single block file, there would be multiple candidates at step 3. Does the requester IPFS broadcast the request to all candidates, or does it pick one, and how ?

How can I do so ?

We use a Kademlia DHT that will keep you connected to your closest ~20 peers (at least). This is more of a side-effect of the DHT than a documented feature.

Small (<256KiB) will fit into a single block so there will only be one request. However, round trips are an issue for path traversal. If you try to look up /ipfs/Qm.../foo/bar/baz/my_file.txt, IPFS will need at least 5 round trips at the moment.

We’re working on a new protocol (graphsync) so we can request a known sub-graph. However, we have a ways to go before it can completely replace bitswap (the protocol we currently use). Currently, you have to tell graphsync to fetch a specific sub-graph from a specific peer (and it won’t, e.g., automatically modify queries to stop downloading sub-sub-graphs that you already have).

So in a case where multiple nodes pinning the same single block file, there would be multiple candidates at step 3. Does the requester IPFS broadcast the request to all candidates, or does it pick one, and how ?

Go-ipfs currently asks a subset (usually one or two) of the peers in the session for each block (IIRC, it picks the subset based on previous request latencies). The number depends on how many duplicate blocks we’re getting.

At the moment, the best case is a 50% block overhead (i.e., you’ll get a duplicate block every other block). We have a large refactor in progress ( to improve this (among other issues).

Assuming you’re using go-ipfs-api, you can connect to a set of peers with shell package - -

1 Like

Actually, IPFS Cluster runs swarm connect for you every time you start a peer (it will connect the IPFS daemon to the ipfs daemons from all other peers after a few seconds (connect_swarms_delay)).

Going back here, I am not sure I understand why you want to “read from all nodes” (saying you add content to all nodes, it will be already available locally). Is it because some node does not have the content and you want to download it to it quickly from multiple sources or for some other reason? In any case, I hope your doubts got solved with the explanations above!

1 Like

Thank you @stebalien and @hector for your clear answers I know understand better!

I would like to perform a read only in a single cluster (instead of the ipfs instance, because I have multiple ipfs-cluster instances on top of each ipfs instance) in order to measure the performance of reading/writing in an ipfs-cluster. I guess that running one ipfs instance per cluster instance would be the easiest solution to do so.

But @guissou, when you do client.IPFS() you get an IPFS client (sh) pointing to the proxy API provided by Cluster, and this is just going to forward the .Cat request to IPFS. So Cluster is doing nothing there.

Cating from Cluster is the same as reading directly from IPFS (with the delay introduced by the proxy). You can also create the shell object pointing directly to the ipfs port localhost:5001 (as in the example here: or change the cluster client config (ProxyAddress -

In the case of writing, the difference between /add on ipfs add /add in cluster is that cluster will do all the file splitting and can send the resulting blocks to several ipfs daemons from other cluster peers as that happens (the recently introduced local flag allows to only send to the local daemon where the content is being added). This is usually way slower than adding directly to IPFS because of all the redirections are forwardings to do the above, but may be more convenient, particularly for small pieces of content that you just want to directly push to the destinations where they will be pinned.

Yes that is how I understood the Cat. I just want to limit the read to peers in a single cluster (which is a subset of all connected peers), and not wait for all hosts connected to the ipfs instance.