IPFS Propagation

Hello,

Iā€™m trying to understand the way a file propagates over IPFS.

Iā€™ve spent a lot of time googling and canā€™t seem to get a straight answer on this.

If a file is uploaded to IPFS, does it replicate to some standard? Do a certain number of nodes try to ensure 2x or 3x replication? Or, is it the case that a file only exists on the machine that ā€˜uploadsā€™ it until another machine requests it and only THEN is it replicated to the other machine?

I understand that files are chunked and if another file already exists with the same hash then the chunk is not duplicatedā€¦ how does that work?

Hi John,

To answer your first question: Files only propagate through IPFS when they are requested by a node. If youā€™re running a node and you upload a file to it, no other nodes will pick up that file by default. If you were to then request that file (by hash) through ipfs.io then a copy would remain on that node for a certain amount of time. If someone then requested it from their own IPFS node then they would receive a copy from the 2 nodes that already have it. Unless the file is ā€œpinnedā€, then the nodes would delete it when their stores were getting full to make space for other files.

Your second question might vary depending on the implementation, but currently files are stored in 256k chunks. Each of these chunks is referenced by a hash of its contents. Those hashes are then combined together into another hash which represents the full file. Because files (and chunks) are referenced by these hashes, and the hashes are deterministic (always the same for the same data) it means that the contents only need to be stored once.

Hope that helps,
Matthew

Thanks Matthew!

So just to confirm, If my node download a copy of a file, it will delete it after some time?

I thought the idea was to have many ā€˜seedersā€™?

I understand there is a garbage collection process, I thought files were kept ā€˜foreverā€™ by their owning nodeā€¦

Is there any docs on this?

  • Files are not ā€˜uploadedā€™ anywhere when you first add them, merely just makes them available for other nodes to request.
  • Pinned files are exempt for the garbage collector.
  • When adding a file to IPFS it is also pinned by default.

Having the primary replication method be based on the users interest allows the system to scale capacity and demand very closely together.

For data durability the primary replication method does nothing to assure it, that is where something like ipfs-cluster comes in. This orchestrates multiple nodes that you have control of to ensure 2x, 3x, 4x, etc replication.