Mirroring standard websites to IPFS as you browse them

Iā€™ve had an interesting question for a while. Note that if this is indeed possible, I advice attempting it with caution, as it may lead to what some consider copyright infringement. Here is my question:

We know how IPFS works when browsing websites in the network. You type the URL http://127.0.0.1:8080/ipfs/my_hash_here into your browser, the js-ipfs or go-ipfs node on your computer loads up the website, and in the process it temporarily stores every file on it and seeds it for other nodes.

What I want to know: Suppose you have js-ipfs or go-ipfs installed. You are browsing a normal website on the internet, like deviantart.com or youtube.com or twitter.com or whatever. Is it possible to teach your web browser to also add and seed copies of those websites in the IPFS network as you visit them? Oppositely, could the browser also learn to look inside IPFS for every file embedded on a website (image, video, etc)?

My idea is that if enough people had such a browser capability, they could essentially mirror parts of the existing internet onto IPFS as they visit them. This would obviously only work to a very light extent: You canā€™t mirror the PHP scripts or MySQL databases behind a given site, so obviously you couldnā€™t replicate the functionality its server offers. However you could replicate certain resources, which would namely be useful when browsing sites with a lot of images / videos / audio.

An example: Imagine your node automatically running ā€œipfs addā€ for every Youtube video you watch, transferring the source from Googleā€™s servers into the network. When you then want to view that video, your IPFS node will additionally know to look for it inside the network and override the source so you can watch the video on youtube.com without even loading it from its server!

2 Likes

I think this could work using a SOCKS5 proxy that would simply hash and store every response and headers and return the stored responses based on the cache header or some other identifier.
This would be basically like creating a tiny wayback machine, but I donā€™t know if the ā€œlook inside IPFSā€ part would be so easy to implement (might be more trouble than itā€™s worth it)

2 Likes

Parts of such a mechanism do exist to some extent; however, I donā€™t know of a project that would wrap them in a single, easy to use package/tool.

Specifically, see: https://github.com/oduwsdl/ipwb (work in progress) ā€” this project currently more or less supports ā€œpart 2 and 3ā€ of your idea (ā€œseed copies of websites [extracted from WebArchive .warc format] in the IPFS networkā€ + ā€œlook inside IPFS for every file embeded on a websiteā€).

For ā€œpart 1ā€, that is, scraping a browsed website into a .warc archive file, see e.g.:

2 Likes

Iā€™ve been tracking related discussions for some time, see notes and other threads linked at:

:point_right: 2021 update: a high fidelity solution exists! And snapshots can be shared over IPFS :rocket:
(posting in this topic for discoverability)

3 Likes