How to identify the correct IPFS hash of a domain's content trustlessly

Hi!

I’m building an application that needs to have access to the true IPFS hash of certain sites. Because of this, it cannot rely on information provided by the site owner (like a header response such as x-ipfs-path).

Is there a way to identify the true IPFS hash of a domain and its content without relying on a trusted intermediary?

Hi, welcome to our community! A bit unclear on the intentions / goals behind your question but I’ll try my best …

I’m building an application that needs to have access to the true IPFS hash of certain sites. Because of this, it cannot rely on information provided by the site owner (like a header response such as x-ipfs-path).

My confusion with this section are the headers. Those are HTTP headers which you could verify with HTTPS, I suppose this isn’t enough for your use-case?

Is there a way to identify the true IPFS hash of a domain and its content without relying on a trusted intermediary?

If you don’t want to rely on information from the site owner OR a third party, then this is an impossible task, no? You can verify the content from the CID is correct, but you cannot know you received the correct content without trusting the actual original source of the content OR a third party.

Like I could say you could verify by checking against what their IPNS key provides (assuming they have one) but then you’d have to trust their IPNS key.

I could say you could use DNS & certificates using TLS, but then you have to rely on cert authorities and DNS.

Not to mention the “true IPFS hash” of a domain is fluid, you’re looking for the most current hash or …?

For example for your criteria, if I were using HTTPS, I 100% knew I was talking to Google for example, but I didn’t trust Google themselves, then I wouldn’t trust their homepage to be their homepage … but then why am I talking to Google at all?

Hi! Sorry for the late response and thank you for your answer.

The idea is to find the webpage content’s CID in the same way Brave does it. When you open a website that uses IPFS to serve the HTML, Brave identifies the content’s CID and gives you a little open with ipfs button in the URL bar. I’m interested to know how Brave achieves that.

Not to mention the “true IPFS hash” of a domain is fluid, you’re looking for the most current hash or …?

Yes, the most current hash.

For example for your criteria, if I were using HTTPS, I 100% knew I was talking to Google for example, but I didn’t trust Google themselves, then I wouldn’t trust their homepage to be their homepage … but then why am I talking to Google at all?

An example that might help would be the latest front-end hacks that have been happening in the Ethereum ecosystem. In that case, you should not trust the domain at all, but trust the verified ipfs hash that you already know is the latest right build. Our app deploys websites/publishes content on IPFS, so it also keeps a registry of the latest build information for each app. Now we want an extension that checks the cid of the site’s content when opened and compares it against the registry of the latest build info.

Please let me know if the description is still not clear. Thank you in advance :slight_smile:

1 Like

Hey @EmperorOrokuSaki sorry for my delayed response! I believe your answer is in your initial question, to know if a website has an IPFS equivalent copy you’d rely on the x-ipfs-path header. As you’ve noticed this means we’re trusting the location to give us the correct hash corresponding to the page they’re serving, and a compromised page would just lie. This is unfortunately a downside to location-based addressing in general. If you wanted to ensure you’re receiving the page you wanted, you’d have to retrieve a known legitimate copy from your registry of CIDs.

Another thing you could do it check if there’s an IPNS name associated with the page (ipfs name resolve <domain>), but if the key is compromised, you could get a malicious result returned as well.

I hope this helps break down the problem a bit at least.