How IPFS backs up data

wwwgel · February 24, 2020, 10:58am

Problem：When uploading a new file, will it be backed up to other nodes or just to itself？

I set up a private network for testing: When a new file is uploaded, the file will be cached in blocks in the current node and will not be distributed to other nodes for backup data blocks.
And I analyzed go-ipfs Source code. No code was found to distribute data blocks to other nodes for backup.
Is my analysis correct？ And is there a detailed process for analyzing IPFS upload files？

RubenKelevra · February 24, 2020, 3:02pm

Welcome wwwgel,

your analysis is correct. Adding a file to your client only allows the network to access it, if another client has either knowledge about the file in terms of the checksum after adding it, or if you shared this info with another node.

If you’re searching for a way to distribute data reliably within the IPFS network, the cluster-implementation might fit your needs:

RubenKelevra · February 24, 2020, 5:43pm

I have no idea what this is supposed to mean. You’re free to publish any file you want. Like if you run a webserver, you’re free to publish anything you’re want to the internet.

The difference is, that the webserver and the person who’s downloading/viewing stuff do exchange enough information what the person downloading/viewing can help to spread the file to the network as well, and can fully replace your webserver.

wwwgel · February 25, 2020, 2:08am

May I understand that one single node works such as a webserver?
When I publish a new file to IPFS network with my node, I surpose my node is the only node which has a copy of that file.
As soon as other users try to reach that file, certains nodes of those users will cache that file.
So the more users try to reach that file, more nodes will cache that file?

RubenKelevra · February 25, 2020, 2:39am

That’s basically the concept.

In case of otherwise available files, like a ISO for a Linux distribution, another user can also add the exact same file this way, and the Content-IDs would match.

This means you doesn’t have to transfer files from one node to a different one, by can also add the same file on different locations to get the same effect.

If you want to ‘hold’ specific content, the command is ‘pin add’ on the CLI. This will receive any data necessary from the network, according to the Content-ID and avoids that the data is deleted by the garbage collector.

Additionally all data you add in the files tab on the GUI will be spared from beeing garbage collected, when you run low on disk space.

The default disk space is set to 10 GB, if 9 GB is filled the garbage collector will try to make new room by dropping stuff you have not pinned.

So if someone accesses your file, it’s definitely in their cache, but it might not be pinned or just partly cached. If the user is running low on disk space, your data might be dropped again from his cache - since it’s not pinned, when just accessed.

Running a cluster on the other hand guarantees that a given amount of copies are hold in the network, allowing for redundancy and parallel downloads of the data.

wwwgel · February 25, 2020, 3:12am

I understand, thank you very much

RubenKelevra · February 25, 2020, 4:37am

You’re welcome!

Explain a bit what you’re trying to achieve with IPFS, and we might figure out if there’s an application build upon IPFS which can provide the functionality.

Best regards

Ruben

wwwgel · February 25, 2020, 6:02am

I want to use IPFS to make a CDN system

RubenKelevra · February 25, 2020, 6:34am

Cloudflare already offers a gateway to IPFS, you can use it, if you’re talking about getting stuff from IPFS to the clear web in a CDN way.

If you’re searching just for a pin-service, there are pinata and Infura for example.

wwwgel · March 12, 2020, 12:39pm

Excuse me, I have a new question.
When I publish a new file to IPFS network with my node, only my node owns a copy of this file. How do other nodes find this file on my node？
I read the IPFS white paper. IPFS should use KADEMLIA DHT, so according to my understanding, the block should be stored on the corresponding node according to the hash value, So that other nodes can quickly find the node that stores the data.
But from the current analysis, the blocks are not actively backed up to the corresponding nodes. Feeling incompatible with the design of KAD.

Topic		Replies	Views
Is there any doc on IPFS file replication, how it avoid single point failture?	10	4709	November 8, 2017
IPFS file distribution clarification Help	6	872	November 2, 2020
What is the backup and redundancy mechanism? and I havent achieve p2p Help go-ipfs	8	758	May 18, 2021
IPFS Cluster vs IPFS Private Network : Data distribution	12	2093	September 30, 2020
About the availability and distribution of IPFS Help	3	545	July 1, 2019

How IPFS backs up data

Related topics