Tarball of all pinned objects

Is it possible to generate a tarball of all pinned objects from a repo… and then import to a new repo?

Something like postgresql’s pg_dump and pg_restore

Possible reasons for doing this:

  • Move a repo from flatfs to badgerds
  • Maintain some kind of backup of pinned objects at a particular time which could then then restored upon repo corruption.

I understand that ~/.ipfs/ can easily be copied and/or rolled into a tarball, but that’s kind of a kludge as well as not datastore agnostic.

Yes with a bit of manual work.

First, you don’t want a tarball but a .car file.

Instead of storing unix files, a .car file store raw IPLD blocks, that important because details such as chunking or layout can change for identical files. Or in other words, even if you have the original file it is not certain you are able to add it back with the same CID later (actually in most cases you can get it to work, it’s just lots of extra work so just to be sure, store a .car).

Here is an example of a script I would write (WARNING UNTESTED CODE, USE AT YOUR OWN RISK):

for CID in $((ipfs pin ls --type=recursive; ipfs pin ls --type=direct) | cut -f1 -d' '); do
  ipfs files cp /ipfs/${CID} /${CID}
done

ipfs dag export $(ipfs files stat --hash /) > out.car

This example script should try to write all of your pins to the out.car file, you can then on an other machine import it with ipfs dag import out.car you can also pipe it on the fly, saving disk space with something like ipfs dag export $(ipfs files stat --hash /) | ssh remote.machine.example.com ipfs dag import.

Let’s break it down:

(ipfs pin ls --type=recursive; ipfs pin ls --type=direct)

This part just list your pins, we are using types recursive and direct because we want to omit indirect pins, thoses blocks are desendent of recursive pins, so they will be included later, we can skip them here to save time.

cut -f1 -d' '

When IPFS outputs a pin it output it like this:
<CID> <TYPE>
We only care about the CID so we use cut to isolate everything before the first space character.

for CID in ... ; do
  ...
done

This is a loop, we will execute the command between the do and done once for each line in the generator (the thing after in) with the variable CID set to the content of that line each time.

ipfs files cp /ipfs/${CID} /${CID}

Right here we use the MFS (Mutable File System), this is a virtual file system IPFS propose.
So for each CID in your list of CID we “copy” it into the root folder with a name of the CID.
So for example, if you pin the CID Qmexample, we will copy Qmexample into the root of the MFS.

This is not really copying, but just creating references from the root folder to the pin folder / file.

We do all of this in order to have a single folder we can export (else we would need multiple .car file, one per pin).

ipfs files stat --hash /

This just read what the CID of that root folder we created.

ipfs dag export ... > out.car

This export the root folder into a file named out.car.

1 Like

Everything looks good. Thanks!

I like the use of MFS as a “middle-man” so to speak. That way all MFS links should be maintained as well. A restore script could include removing the MFS CID links sitting in the MFS root. I’ll give it try and see what happens.

Together with exporting keys, a full backup of a repo can be made.

You are right, but that was just an example I threw in 10 minutes. :slight_smile:
I should have put you on the track to make something better, if you ever make something more complete, pls send it here so we can all profit from it. :slight_smile:

1 Like

The “export” and “import” of the data works quite well.

However, all the direct pins in the original node are indirect pins in the “restored” datastore and the MFS in the “restored” datastore contains zero references.

But, it’s a really good start to work with. I see if I can clean up everything a bit and post revamped script in a day or two.