Well, not GPG. We sign the packages when we are uploading them to the archive. Then the main archive generates hashes, which is what you will see here and in some other dirs:
After leaving my machine synchronizing the mirror during all my holidays, I finally have all the files locally.
Iām adding them to IPFS, which brought two unexpected problems. First, the recursive add will take a little more than 12 hours The Ubuntu mirrors are supposed to sync every 6 hours, which will not be possible with IPFS. We will have to sync it only once a day.
Second, ipfs stores local blocks for each file, so that duplicates the amount of space required for the mirror. The main mirror, instead of ~2TB will require ~4TB. As suggested before, the others can just sync the IPFS blocks so they will still require ~2TB.
Iām now going to package the transport into a PPA so it will be easy to install, do more tests with my local mirror and continue trying to find a server with better bandwidth, and experiment with bitswap to sync multiple mirrors.
panic: interface conversion: interface {} is cmdkit.Error, not *coreunix.AddedObject
goroutine 37 [running]:
github.com/ipfs/go-ipfs/core/commands.glob..func7.1(0xc4202ec060)
/cwd/parts/ipfs/go/src/github.com/ipfs/go-ipfs/core/commands/add.go:390 +0x9fc
created by github.com/ipfs/go-ipfs/core/commands.glob..func7.2
/cwd/parts/ipfs/go/src/github.com/ipfs/go-ipfs/core/commands/add.go:449 +0xc7
I will report the bug and try to dig into it later. For now, I will use the command with copy.
I donāt think it stores the progress anywhere to be able to pick up where it left off. Subsequent runs of ipfs add should be significantly faster (probably not much of a difference with --nocopy though) since the blocks up to the point where it stopped already exist in the datastore and donāt need to be written.
Our current datastore is slow. If you want something faster, try badgerdb. Unfortunately, that datastore backend is experimental for a reason.
One of our huge bottlenecks is telling everyone on the network that you have a file. Weāre working on making this better but itās a bit of a fundamental problem.
One partial solution is to:
not announce to the network that you have the files (by adding them with ipfs add --local).
Have the APT backend connect to known IPFS mirror peers.
Unfortunately, thatās not very decentralized⦠(although the Ubuntu installations that use this backend will still announce the files they have to the network).
Ideally, weād announce the root nodes of all pinned files to the network but we wonāt be able to do that for a while.
We were left in a weird situation. This morning it was reporting less than 30 minutes to complete the add. But for some reason, the server got stuck, our byobu session was killed and all the IPFS processes stopped.
So, now how do we know if the add was completed? It would be sad to have to run the full recursive add again, because the files in the directory have not changed. But we have no clue if thatās the only way.
I donāt know how many things you have pinned to your node, but if itās not too many you could look through the results of ipfs pin ls --type=recursive to see if any of the pins are for the content you were adding. By default ipfs add will recursively pin the content you added.
If you know what youāre looking for you can also search through the folders/files in the top-level hash using something like
ipfs pin ls --type=recursive -q | xargs -L 1 ipfs ls | grep "fubar"
Unfortunately, I believe thatās the only way. We have to re-read and re-hash the files to verify that they exist in the repo (we assume that they may have changed although we could probably relax this constraint for the filestore). Note, we wonāt (or shouldnāt at least) actually write them to the repo again.
One way to avoid this would be to use MFS and add the files one-by-one. That is,
#!/bin/bash
set -e
FROM="$1" # local directory
TO="$2" # directory in MFS
find "$FROM" -type f -readable -o -type d -readable -executable | while IFS= read -r -d '' fname; do
if [[ -d "$fname" ]]; then
ipfs files mkdir -p -- "$TO/$fname"
elif [[ -f "$fname" ]]; then
if ! ipfs files ls "$TO/$fname" 2>/dev/null; then
# will be pinned in the next command (you should probably disable GC)
cid="$(ipfs add --pin=false --local -q "$fname")"
ipfs files cp -- "/ipfs/$cid" "$TO/$fname"
fi
else
echo "not a file or directory: $fname" >&2
exit 1
fi
done
Note, that script is rather slow⦠a better one would list the directory you want to import, the target MFS directory, find the diff, and then add the files in batches. However, writing that script is a bit of an endeavor.
Iāve opened an issue to discuss adding a command to do this to ipfs:
FYI, the next release should make this a bit better. We figured out why adding large datasets causes problems with go-ipfs (we had a leak that has been fixed).
Hey @elopio ā can i get access to the archive? (or how to make it) iād like to run some tests with rabin fingerprinting + badger ds, + see if it would make sense to write a custom importer to import the archive smartly, deduping all the internal files that are the same (look into ipfs tar for a preview of what i mean).
Hello @jbenet,
We havenāt been able to ipfs add the full archive, it always fails on 99%. We have reported most of the errors we get. We were thinking to try to add a subsection, to see if that works and lets us test further.
If I may throw in my two cents, if part of the issue that youāre having is uploading a very large tarball to IPFS, perhaps try fragmenting the archive, and storing the various pieces? You can then use client-side logic to reconstruct the various fragmentted archive pieces into a single archive.
it would make sense to write a custom importer to import the archive smartly, deduping all the internal files that are the same (look into ipfs tar for a preview of what i mean).
There is a way to indicate that a file is available via http or some other protocol, like TOR or IPFS: Alt-svc
Alt-Svc is an Internet Standard (RFC7838) which allow an originās resources to be authoritatively available at a separate network location, possibly accessed with a different protocol configuration.
TL;DR
The idea of Alt-Svc is for a website to be able to tell a client "For technical reasons you donāt need to care about, please talk to me using [this other web address]. "
The client optionally does so. (They donāt have to.) If they do so, they do not change the address bar or give any sort of visual indication to the user.
If you contacted Ubuntu (and Debian GNU/Linux!) and asked them to support this protocol in the repos, everybody would benefit. IPFS users would have faster downloads. HTTP users would experience less load.