Web browser with integrated IPFS node/support for browser cache?

jamaalsawyerd · January 11, 2018, 2:29am

I thought of the following:

You open your browser and you are spun up as a node. When your browser caches a file it is shared on the network. So, whenever the web browser needs cache-able assets it checks whether they exist on ipfs on a node and d/l it from them, if not the website is pulled from the site and cached in the traditional manner. Your node shuts down when you close the browser.

The idea being that you would share and benefit from the cache of everyone also currently browsing at the same time you are. Is this theoretically possible?

kehao95 · January 11, 2018, 5:43am

Yes, I like your idea. If any browser, while surfing the internet, can work as an IPFS node, fetching and contributing content to the IPFS network that would be fantastic.
The fundamental problem is that in HTTP protocol, content is mutable. But in IPFS protocol, content is the only identification factor. Which means you can not get any information about the content unless you download it. Given an http linked file, you cannot convert it into ipfs:// with confidence. That is fundamental.

But we can’t ignore that there are always many duplicated files around the internet. ex: wildly-used js libs, common style files, analysis js … In a word, any files that are cached somewhere (CDN) should be suitable for ipfs. That where IPFS will take place HTTP as planned.

I’ve actually come up with some ideas may work in limit situations.

[1]. HTTP 300:
There is potential compatibility in HTTP Stander in HTTP 3xx (Redirection). The Redirection is not limit to HTTP but can be any URI (ipfs included) Especially for HTTP 300

The HTTP 300 Multiple Choices redirect status response code indicates that the request has more than one possible responses. The user-agent or the user should choose one of them. As there is no standardized way of choosing one of the responses, this response code is very rarely used.

When receiving an HTTP GET request for a file, instead of directly return the data, the server can choose response an HTTP 300 with 2 URIs including one direct HTTP GET link and one IPFS link. So that the client can choose IPFS to get the files in IPFS network.
Risk: since there is no standardized way of choosing one of the responses. No sure if normal browsers without ipfs support can handle this properly.

[2]. HTTP HEAD before GET:
Before GET a resource, the browser can send a HEAD request to get meta info about the file. The return header always includes the file’s Content-Length Content-Type etc, in rare cases, the files checksum is included as well. For IPFS support, the sever can return the IPFS hash as well. So that the browser can use the IPFS hash to get the file instead of performing a GET request.

jamaalsawyerd · January 11, 2018, 10:41pm

Okay, so let me see if I understand this correctly:

The main hurdle to my idea is that there needs to some known piece of info (like a hash or known address) about the desired content provided to the client in order to check the ipfs network for that specific file. I really like your second idea. I guess the question is, how would the server find out the ipfs addresses?

I guess one solution is to be running ipfs and including certain folders in it. I feel like that’s potentially too tedious though, and you wouldn’t want people to shoot themselves in the foot hosting the wrong files by accident on their server, they just want the cache-able ones included.

Jared4DataRoads · January 12, 2018, 1:52am

I really like this idea, and it has some similarity and overlap with my transparent proxy cache (eg. Squid on IPNS) and IPNS-CDN ideas. The main difference is this would be implemented on the browser directly, rather than on a LAN edge gateway, but otherwise you should be able to map URLs to IPNS paths the same way on both.

One security caveat here is that you would want to exclude data from the cache that is personal to the browser user at all, for example bank statement images sent via HTTPS.

Otherwise, this would never require the HTTP 300 response feature to map from URLs to IPFS multihashes (IPNS does all that mapping work), but the HTTP 300 feature would still be nice to implement on IPFS backed IPNS-CDN gateways.

kehao95 · January 13, 2018, 12:26am

how would the server find out the ipfs addresses?

Yes, as I mentioned before, the ipfs multi-hash should be and can only be provided by the server. (science the content may be dynamic). So the server is responsible for selecting static contents. This can be implemented in the web frameworks. On responding to HEAD requests, the server can calculate IPFS multi-hash just in time.
Moreover, the server, actually, needs not to running any IPFS node. And need not to serve the content in IPFS itself. If the content is a popular one, like jQuery lib, the client can always fetch it elsewhere. And if the content is original, the former customers can GET it through HTTP and provide it to the following customers before they go offline.

tglea27 · March 24, 2018, 9:14pm

Living in the CDN space for sometime, I am way too familiar with some of the limitations of browser caches and HTTP 1.1 (so IPFS + browser caches could be awesome!).

The HTTP working group did think about people wanting to possibly use different ports and protocols for some pieces of content and they came up with: HTTP Alternative Services for HTTP2.

This could be a more elegant way to handle this than a redirect based solution, however you still need demarcation of the shared cache as indicated above (there is no reason an HTTP header cannot indicate whether the IPFS node will or won’t seed content). Also, this functionality hasn’t been very widely implemented or adopted, from what I understand it was a future-proofing mechanism. HTTP2 implementations also have to take advantage of TLS.

There is a philosophical issue here though…most end users are very use to being clients, not content providers. Hopefully, we can change hearts and minds about that!

jamiew · March 25, 2018, 1:13pm

The beta version of the IPFS Companion browser extension does something close to this

lidel · April 1, 2018, 7:20pm

Small clarification: we are experimenting with running embedded js-ipfs node within browser extension, but as a means of exposing API as window.ipfs on every page.
Things discussed in this topic are not implemented, but we are tracking related ideas in dedicated issue:

github.com/ipfs/ipfs-companion

Mirroring Web to IPFS

opened 04:28PM - 26 Mar 16 UTC

closed 01:28PM - 24 Jul 18 UTC

lidel

kind/enhancement help wanted kind/discussion status/ready

Meta-issue tracking related work and discussions got moved to https://github.com…/ipfs/in-web-browsers/issues/94 <details> <summary>Click to expand historical notes before the move</summary> ## Ready to Implement - [x] Integrate js-ipfs library to handle multipart upload to API - <del>there are issues with browserified version that need to be resolved first: missing `os` module, and when all shims are enabled `global.XMLHttpRequest` is missing</del> - [x] Image Rehosting via HTTP API (#59) - [ ] Save whole page to IPFS (creating a one-time shareable mirror/snapshot) (#91) ## More Design Work Required - Automatic mirroring of standard websites to IPFS as you browse them - **IMMUTABLE** assets: very limited feasibility, so far only two types of immutable resources on the web exist: - JS, CSS etc marked with SRI hash ([Subresource Integrity](https://www.srihash.org/)) (mapping SRI→CID) (see discussion from 2016-03-26 below) - URLs for things explicitly marked as immutable via `Cache-Control: public, (..) immutable` (mapping URL→CID) - **MUTABLE** assets: what if we we add every page to IPFS store mapping between URL and CID, then if page disappear, we could fallback to IPFS version? - a can of worms: a safe version would be like web.archive.org, but limited to a local machine. Sharing cache with other people would require _centralized_ mapping service (single point of failure, vector for privacy leaks) - So what is needed to make it "right"? - keep it simple but robust: no http, no centralization, no single point of failure - Ideally, URL2IPFS lookups would not rely on centralized index. - rough idea (https://github.com/ipfs-shipyard/ipfs-companion/issues/535#issuecomment-407046442): what if we create pubsub-based room per URL? for example: - When you open a website, you subscribe to pubsub room unique for that URL - If pubsub room has entries under "keepalive" treshold, grab the latest one - If room is empty or keepalive timeout is hit, fallback to HTTP, but in background add HTTP page to IPFS and announce updated hash on pubsub (with new timestamp) for next visitor - There are still pubsub performance and privacy problems to solve (eg. publishing banking pages), but at least we don't rely on HTTP server anymore. - Other notes - "[webpackage](https://github.com/WICG/webpackage/blob/master/explainer.md#web-packaging-format-explainer)" standard proposal surfaced recently, among other things, it aims to address website snapshoting use case in a safe and reproducible manner: - [webpackage: Save and share a web page](https://github.com/WICG/webpackage/blob/master/explainer.md#save-and-share-a-web-page) ([Use Case](https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html#snapshot)) - Sounds super relevant to what we want as the endgame here --- ## Related Discussions **2016-03-26** <details><summary>IRC log about mirroring SRI2IPFS</summary> ``` 165958 geir_ │ lgierth: The web sites would have to link to ipfs content for this plugin to work. What i propose is a proxy that works like a transparent proxy and puts content into ipfs if it's not already there 170124 ed_t │ anyone know anything about ipfs-boards 170141 ed_t │ it keeps telling me I am in limited mode 170202 ed_t │ a full ipfs 0.40-rc3 node is running on localhost:5001 170217 ed_t │ but it does not seem to see it using the demo link 170228 +lgierth │ geir_: ah got what you wanna do -- i'm not sure you can easily just rewrite anything 170253 +lgierth │ for completely static pages, yes, but for slightly more dynamic stuff? 170303 +lgierth │ i'll be back in a bit, getting some coffee 170422 geir_ │ lgierth: I mean only for the static stuff like images, libs and so on. Should be pretty strait forward to implement. And a big bandwidth save for big networks 171542 lidel │ geir_, we are planning to add "host to ipfs" feature to the addon 171614 lidel │ when that is done, it should be easy to add option to automatically add every visited page 171634 lidel │ not sure how addon would do lookups tho 171734 lidel │ (meaning, how do i know the multihash of the page, how do we handle ipfs-cache expiration when page gets updated, etc) 171831 geir_ │ lidel: I see, thanks for the info. I still like the idea of a transparent proxy so every user/device on the network will use the "cdn" automatically 171852 lidel │ perhaps we could start with mirroring static assets that have SRI hash (https://www.srihash.org/) 171920 lidel │ and come up with a way for doing SRI2IPFS lookups ``` </details> **2018-01-14** - https://discuss.ipfs.io/t/web-browser-with-integrated-ipfs-node-support-for-browser-cache/1799/5 **2018-03-08** - [Suggestion] : IPFS browser extension as lite-node? https://github.com/ipfs/ipfs/issues/310 **2018-07-09** - https://discuss.ipfs.io/t/mirroring-standard-websites-to-ipfs-as-you-browse-them/3355 **2018-07-23** - http->ipfs translator proposal #535 - webpackage standard draft - https://github.com/WICG/webpackage/blob/master/explainer.md#save-and-share-a-web-page - https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html#snapshot </details>

Topic		Replies	Views
Mirroring standard websites to IPFS as you browse them Ecosystem and Usage	4	1582	February 26, 2021
How to properly use the link href in websites for ideal IPFS and browser usage? Ecosystem and Usage	7	1827	November 17, 2020
Browser dedicated to IPFS	9	652	September 22, 2019
How to upload file to IPFS with only front-end	31	7576	May 5, 2022
JS-IPFS Node Lifecycle Help	0	391	August 23, 2018

Web browser with integrated IPFS node/support for browser cache?

Related topics