IPFS and sneakernet

rklaehn · November 5, 2017, 12:35pm

In many countries with internet censorship, sneakernet is a very common form of data transfer. Even in countries without drastic internet censorship, a multi-terabyte-sized external hard drive is a pretty efficient way to transfer large amounts of data.

Currently, as far as I can see there is no support for mirroring the contents of the block store to an external hard drive. Or am I missing something?

Would it be a good idea to add this? It would seem to be quite simple conceptually.

In the most primitive form, you would have a command to export data from the block store to an external location, e.g.

ipfs export location (hashes)

and a command to import from a location, such as

ipfs import location

Data in the exported location would still be in native format of a block store, with hashes to ensure content integrity and allow for quick import and export.

A more sophisticated version would permanently monitor a number of file system locations and sync to/from them.

flyingzumwalt · November 6, 2017, 4:06pm

I know it’s not intuitive, but you don’t need special commands to do this because IPFS is content-addressed. If you get or cat the files out of IPFS and then add them to another IPFS node elsewhere you will end up with the same hashes because you have the same content.

In other words, your suggested ipfs export and ipfs import commands are just a different name for ipfs get and ipfs add.

Instead of needing the proposed commands ipfs export location (hashes) and ipfs import location, you can just run these existing commands:

ipfs get --output path-to-external-drive

and then

ipfs add path-to-contents-on-external-drive

There’s a tutorial describing how to use IPFS for sneakernets in the DWeb Primer at https://dweb-primer.ipfs.io/avenues-for-access/lessons/sneakernets.html

Now, it will be especially interesting when it’s easier to run multiple IPFS nodes on the same machine, because then you can run a node whose data store is on the external drive and another node whose data store is in the default location and then use strategies like ipfs-cluster to sync content across the nodes. That will be really useful, but it’s not necessary to satisfy the basic sneakernet usecase you’re pointing to.

rklaehn · November 6, 2017, 4:49pm

Thanks for the quick answer!

I know it’s not intuitive, but you don’t need special commands to do this because IPFS is content-addressed. If you get or cat the files out of IPFS and then add them to another IPFS node elsewhere you will end up with the same hashes because you have the same content.

That is perfectly clear. I was thinking that not converting from/to a file system hierarchy would be more efficient and more reliable, since the hashes would prevent unnoticed data corruption.

But just running a second ipfs node that is using the external drive as its data store might be a clean solution. This would benefit from an efficient transport for two ipfs nodes on the same machine (unix sockets or shared memory?), but that is just an optimisation detail. I guess you could write a polished gui for sneakernet based on this approach.

One thing: how would you make sure that everything from the main ipfs is synced to the external drive ipfs, not just specific hashes or pinned hashes? I know the ipfs philosophy is to share data only on explicit demand, but in this case the option to share everything might be useful.

flyingzumwalt · November 6, 2017, 6:52pm

how would you make sure that everything from the main ipfs is synced to the external drive ipfs, not just specific hashes or pinned hashes?

I can’t remember off the top of my head but I think there is a command to list all the hashes in your ipfs repo. There is definitely a command to list all of your pinned hashes. You can pipe that into whatever sync routine you run. In the long run, you will need to make explicit pin sets for the things you want to sync, and accumulate metadata about those pin sets, so you know what data you’re copying to which machines. That metadata layer is not explicitly accommodated by the ipfs protocol – you need to build it yourself.

stebalien · November 6, 2017, 11:02pm

That’s not entirely true. We may end up importing the files with a different chunking algorithm.

This would benefit from an efficient transport for two ipfs nodes on the same machine (unix sockets or shared memory?).

We’d like this, not only for communicating between nodes on a single machine but for communicating between a running daemon and the CLI tool, but we don’t have it yet.

One thing: how would you make sure that everything from the main ipfs is synced to the external drive ipfs, not just specific hashes or pinned hashes? I know the ipfs philosophy is to share data only on explicit demand, but in this case the option to share everything might be useful.

Not that I know of and, due to GC, it’s probably best not to rely on this. Personally, I recommend either pinning or adding files you care about to your local mfs (ipfs files ...).

Note: an alternative to all of this is to shutdown your local daemon and copy the repo. However, that format changes over time so it’s less likely to be stable.

We’ve discussed having a file format called CAR for exporting/importing merkledags. We have some notes here: GitHub - ipfs-inactive/archive-format: [ARCHIVED] DEPRECATED — car - Certified ARchives. However, they’re pretty out of date.

Basically, we want several properties from CARs:

Certified/Hashed.
Seekable/Traversable: It should be possible traverse a DAG through a CAR in one pass (without necessarily reading the entire CAR).
Simple/Stable: Importing/Exporting should be easy. That repo I linked to mentions things like signatures, metadata, etc. However, that’s really a separate concern.
Compact.

I’m currently writing a proposal but a CAR will likely be a concatenation of a topological sort of the IPLD DAG. Specifically:

magic-number root-cid [object [child-offset]*]*

stebalien · November 7, 2017, 6:07pm

I’ve uploaded a ~~simple~~ WIP proposal for a CAR format:

It’s not completely fleshed out but does this look like it will cover your use-case?

rklaehn · November 7, 2017, 7:32pm

That was fast!

Looks pretty good. This is exactly why I thought a compact on-disk format might be useful.

Further comments/questions in the PR.

flovephoto · June 30, 2021, 10:11pm

Hi all,

I know I’m picking up an old conversation here, but I felt it is relatively close to what I’m asking about. First, I’m not a developer, but have a question.

Does anyone know of a way, or if it could be possible, and forgive me if this sounds like a dumb question, but to be able to format a hard drive in a kind of ‘local’ IPFS? ie instead of having a drive formatted in mac journaled, APFS, or exFAT etc, it would be a kind of ‘native’ IPFS.

The use case I’m wondering about has to do with the inherent hashing functionality of files, the versioning ability, and the ability to effectively make your own ‘private network’ of drives to control access to with something like a swarm key. So the distributed public network aspects, at least for this part, are not of use

Topic		Replies	Views
Can you push your content to the network and somehow provide incentives for the network to hold on to it with IPFS? Help	3	1149	July 11, 2017
Standard for IPFS over sneakernet? Protocol	7	436	June 4, 2024
Pinning a file without keeping it in local storage Help	4	992	May 18, 2021
How to sync a folder between two nodes？ Help go-ipfs	2	4096	February 3, 2018
Is it possible to map the blocks to existing files? Help	1	710	May 23, 2017

IPFS and sneakernet

Related topics