File hash is different from original

crashbdx · March 10, 2020, 2:17pm

Thank you for your reply, no i was comparing the hash of both files, the file before i add it then the file out, after downloading it from here: https://ipfs.infura.io/ipfs/QmTCP7Ln1PL…

The hash of the downloaded file, when i hash it in a sha256 tools was different than the hash of the initial file. Figured out the file was incomplete after the add, for exemple it missed the background compare to the initial PDF file… no idea why.

Then i changed lib, from ipfs-mini to ipfs-http-client and its working ok now, both files, in and out have same hash.

Ok thanks for the info, i though all these gateway were public and could be used to add file like with ipfs.infura.io, so there is only one public gateway/point of entry allowing add? Best practice is to setup my own gateway?

darinnj · September 14, 2020, 5:27pm

I’m trying to check if a file exist on ipfs and I’m having the same issue where the hash of original file does not match a downloaded copy. And I’m using a very similar set-up to @crashbdx. Environment: react/nodjs, ipfs-http-client, crypto npm packages.

Here’s the operations in order that I’m running…

A PDF is created on the fly using html-pdf npm plugin
Save file to buffer
Upload file to ipfs.infura.io using ipfs.add() from buffer
Save the returned CID
Get file from ipfs with ipfs.object.get(path)
Save file.data to buffer
Convert JSON.stringify(buffer)
Hash JSON.stringify(buffer) with crypto(sha256)
Save the digest.

Now, when I visit the file in the URL and save the PDF to my local machine and run it through the same crypto(sha256), I get a completely different hash. However, when I run a test to ipfs.add(pdf) from my machine, it returns a matching CID.

Does anyone know why this is happening and how to fix it? Ideally I’d like to check if a file exists in IPFS without having to add it, that’s what I’m ultimately trying to do with the hash comparison.

Any help is greatly appreciated.

Thank you,
~Dan

ldeffenb · September 14, 2020, 6:25pm

If you take the buffer in step 2 and run it through the crypto(sha256), do you get the CID returned in 4? If so, then the retrieval is changing the content (unlikely). If you get a different hash than the returned CID, then CIDs aren’t just the sha256 hash of the content. (And I don’t think they are, actually).

I might be wrong on this, but I believe the returned CID from ipfs.add is the CID of a protobuf wrapping meta-data about the file (like its name). That block points to the actual file contents block which would have a completely different CID, which may or may not match the result of your hash, but has a better chance of a match IMHO.

Remember that IPFS does “chunking” as well, so a straight hash of a buffer might only work for smaller than chunk-sized files. You’d really need to do your own content chunking and hashing of each chunk to get a CID that you can check for on IPFS. And even if all of the chunked CIDs exist, it doesn’t necessarily mean that the file you’re starting with is on IPFS, only that the chunks that make up the file are there, possibly as chunks of other files.

At least, that’s my understanding of the internals of IPFS from lots and lots of reading.

ldeffenb · September 14, 2020, 6:34pm

Try doing an ipfs object get on the CID you get back from the add. I suspect you’ll find that it is also a protobuf of links and data where the data is the file contents. I also suspect that only raw blocks added and retrieved with the block API would be usable given your approach.

ldeffenb · September 14, 2020, 6:37pm

You might want to read the original answer to the thread that you’ve picked up on. @hector says the same thing I did, but better.

darinnj · September 14, 2020, 9:11pm

Hi thank you for your response.

I’m not sure if I’m understanding this correctly, but taking step 2 and running it through crypto(sha256) will give me a different result then the CID is step 4. From my understanding IPFS CID is not the digest of a sha256 encryption.

However, I didn’t know about the “protobuf wrapping meta-data” so I’ll have a look at that in combination with the https://cid.ipfs.io/ tool.

You would think that there would be a simple solution to check the existence of a file in IPFS, can’t believe it’s this difficult.

ldeffenb · September 14, 2020, 10:08pm

Look at the arguments for ipfs add, in particular -n --only-hash: “Only chunk and hash - do not write to disk”. Might that give you a CID without actually adding the file to the swarm?

ldeffenb · September 14, 2020, 10:20pm

I added a file called “peers.txt” with a directory wrapper. That gave me CID QmVpAYxVUvBDSakUjNcF1Dv7dGVUtKpMv9SFgApxmraGhx.

An ipfs object get of that CID gives a content hash of Qma2kSdx4uh8VKzb8p8dqqBzDhoSFYVQBsYgHQTeCPbHWD.

Doing an ipfs add --only-hash peers.txt gives me that same content hash.

So, if you do an ipfs add without the --wrap-with-directory, the CID you get from that add should match the CID you get from an add --only-hash of the same file. So, to see if a file is in IPFS, you’d do an add --only-hash and then try to retrieve that CID from IPFS as a file, as an object, or even just as a block. If any of those give you something for the CID, then I believe you can assume the file is in IPFS.

You might even try an ipfs dht findprovs, but that particular query seems to take longer.

darinnj · September 16, 2020, 12:38am

Thank you @ldeffenb for the pointing me in the right direction.

I found this ipfs-only-hash plugin which calculates the IPFS hash for some data without having to install or run an IPFS node. Perfect for comparing files!

Topic		Replies	Views
Why does the same file result in different sha256 in cid? Help go-ipfs	4	537	June 14, 2022
Ipfs file hashes Help	0	529	May 23, 2017
Is there an API method to get the IPFS hash of a file? Help	8	1146	May 17, 2023
2 basic questions Help	2	435	July 11, 2019
Does the IPFS chunking change the CID for the same file chunked differently? Docs & Tutorials	2	914	June 26, 2021

File hash is different from original

Related topics