Thank you for your reply, no i was comparing the hash of both files, the file before i add it then the file out, after downloading it from here: https://ipfs.infura.io/ipfs/QmTCP7Ln1PLā¦
The hash of the downloaded file, when i hash it in a sha256 tools was different than the hash of the initial file. Figured out the file was incomplete after the add, for exemple it missed the background compare to the initial PDF file⦠no idea why.
Then i changed lib, from ipfs-mini to ipfs-http-client and its working ok now, both files, in and out have same hash.
Ok thanks for the info, i though all these gateway were public and could be used to add file like with ipfs.infura.io, so there is only one public gateway/point of entry allowing add? Best practice is to setup my own gateway?
Iām trying to check if a file exist on ipfs and Iām having the same issue where the hash of original file does not match a downloaded copy. And Iām using a very similar set-up to @crashbdx. Environment: react/nodjs, ipfs-http-client, crypto npm packages.
Hereās the operations in order that Iām runningā¦
A PDF is created on the fly using html-pdf npm plugin
Save file to buffer
Upload file to ipfs.infura.io using ipfs.add() from buffer
Save the returned CID
Get file from ipfs with ipfs.object.get(path)
Save file.data to buffer
Convert JSON.stringify(buffer)
Hash JSON.stringify(buffer) with crypto(sha256)
Save the digest.
Now, when I visit the file in the URL and save the PDF to my local machine and run it through the same crypto(sha256), I get a completely different hash. However, when I run a test to ipfs.add(pdf) from my machine, it returns a matching CID.
Does anyone know why this is happening and how to fix it? Ideally Iād like to check if a file exists in IPFS without having to add it, thatās what Iām ultimately trying to do with the hash comparison.
If you take the buffer in step 2 and run it through the crypto(sha256), do you get the CID returned in 4? If so, then the retrieval is changing the content (unlikely). If you get a different hash than the returned CID, then CIDs arenāt just the sha256 hash of the content. (And I donāt think they are, actually).
I might be wrong on this, but I believe the returned CID from ipfs.add is the CID of a protobuf wrapping meta-data about the file (like its name). That block points to the actual file contents block which would have a completely different CID, which may or may not match the result of your hash, but has a better chance of a match IMHO.
Remember that IPFS does āchunkingā as well, so a straight hash of a buffer might only work for smaller than chunk-sized files. Youād really need to do your own content chunking and hashing of each chunk to get a CID that you can check for on IPFS. And even if all of the chunked CIDs exist, it doesnāt necessarily mean that the file youāre starting with is on IPFS, only that the chunks that make up the file are there, possibly as chunks of other files.
At least, thatās my understanding of the internals of IPFS from lots and lots of reading.
Try doing an ipfs object get on the CID you get back from the add. I suspect youāll find that it is also a protobuf of links and data where the data is the file contents. I also suspect that only raw blocks added and retrieved with the block API would be usable given your approach.
Iām not sure if Iām understanding this correctly, but taking step 2 and running it through crypto(sha256) will give me a different result then the CID is step 4. From my understanding IPFS CID is not the digest of a sha256 encryption.
However, I didnāt know about the āprotobuf wrapping meta-dataā so Iāll have a look at that in combination with the https://cid.ipfs.io/ tool.
You would think that there would be a simple solution to check the existence of a file in IPFS, canāt believe itās this difficult.
Look at the arguments for ipfs add, in particular -n --only-hash: āOnly chunk and hash - do not write to diskā. Might that give you a CID without actually adding the file to the swarm?
I added a file called āpeers.txtā with a directory wrapper. That gave me CID QmVpAYxVUvBDSakUjNcF1Dv7dGVUtKpMv9SFgApxmraGhx.
An ipfs object get of that CID gives a content hash of Qma2kSdx4uh8VKzb8p8dqqBzDhoSFYVQBsYgHQTeCPbHWD.
Doing an ipfs add --only-hash peers.txt gives me that same content hash.
So, if you do an ipfs add without the --wrap-with-directory, the CID you get from that add should match the CID you get from an add --only-hash of the same file. So, to see if a file is in IPFS, youād do an add --only-hash and then try to retrieve that CID from IPFS as a file, as an object, or even just as a block. If any of those give you something for the CID, then I believe you can assume the file is in IPFS.
You might even try an ipfs dht findprovs, but that particular query seems to take longer.
Thank you @ldeffenb for the pointing me in the right direction.
I found this ipfs-only-hash plugin which calculates the IPFS hash for some data without having to install or run an IPFS node. Perfect for comparing files!