Does a scanned document and a photo(same document) result to the same content identifier?

Given 2 users have the same document

When user1 uploads a scanned copy and user2 uploads a photographed copy from the same document

Will the resulting content identifier be the same?

For all practical purposes, no. In IPFS identical means identical. Down to the last one and zero identical. Not kind of, soft of alike but completely indistinguishable identical. A single single bit in a multi terabyte file would make a difference.

You’re too casually using the idea of sameness. First you’re talking about physical documents in the real world and it’s hard to compare real world things onto the abstract world of digital files but even physically they don’t have the same document. They have two copies of a document that contains the same information. If you were to look closely they will contain slight variations from the printing process. The variations are magnified when one is photographed and the other scanned resulting in very different documents.

IPFS aside the problem of comparing two images can be extremely difficult. You could perhaps run optical character recognition (OCR) on both documents and compare the text. There would always be slight differences caused by errors in the OCR process so even there you start getting a fuzzy idea of sameness. Perhaps one document had a company logo and the other didn’t. Are they the same? Maybe one logo is just obscured because of a poor scan.

1 Like