If there is a bunch of stuff in some centralized cloud provider’s object storage service and I have the sha1, file size, and path of everything I want to add do I still need to download everything in order to get them into IPFS? I know files bigger than 1MiB need to be downloaded/chunked/hashed since they can’t be used with --raw-leaves. Is there a way to --raw-leaves --nocopy URLs without downloading the file if I already have the hash?
For all of the stuff that I do have to download/chunk/hash could I use a VM in the cloud provider to do that work then transplant that effort to an IPFS node outside of the cloud provider without downloading the files from the VM or from the object storage service? The reason I want to do this is because I don’t think the cloud provider charges for bandwidth between object storage and VMs in the same region.
I want to add files to IPFS but I don’t want to store a copy of everything and I only want to pay a bandwidth cost when I actually want to retrieve a file possibly with some caching.
IPFS doesn’t use SHA1 by default – though maybe it technically supports it. In either case, the SHA1 isn’t going to help you.
If the bucket with the objects is public and accessible via http, you could probably add all of the URLs for each of the objects using
ipfs add --nocopy --cid-version=1 URL
This would incur a one-time cost of downloading the content; but then the content should be available using the multihashes returned by ipfs add
. As long as the IPFS node you added the content to is online, it should be able to act as a kind of bridge between your object storage and other IPFS peers who request content using your multihashes.
Note that I don’t think there’s a caching layer on your IPFS node between IPFS requests and the object storage. But if someone else requests a multihash from your node, it should be cached for future requests on their node too.