Hey guys, Hope you’re doing all very well!
We have the great features DNSLink and IPNS, which allows us to upgrade domains in the browsers (and URLs to a degree) to IPFS-CIDs which the browser can access via the IPFS network.
But there are certain limitations that currently hinder smooth transitions between a regular web server to a site running fully on IPFS. The main limitations come from queries behind the path part in the URL as well as the missing support for any non-http(s) scheme.
The idea to fix this is simple:
- Hash the URL
- Write the CID-informations in a DHT with the URL-Hash
- Sign the information with the IPNS key for the Domain
- Store the IPNS key for the Domain in the DNS
To avoid that false information overflow the DHT, all information added should be individually verified by the nodes storing it, so they should ask the DNS system for the IPNS key and check the signatures before storing it and offering it to the network. Also the timestamps should be checked if they are within the clock of the host. This way no entries for the future or the past can be published.
Rationale
We are currently expecting that the Web has only http(s) URLs without a query part and can’t support the rest.
Using hashes to inform a client about the availability of the information in the IPFS-network reduces the needs for workarounds and helps to reduce the complexity of a transition between web servers and IPFS-libraries.
Since this approach isn’t limited to the http/https scheme we can extend this in the future to other URIs.
Additionally, we can use the URIs to resolve to p2p service inside the IPFS-network. This allows us to extend the clients in the future route something like IRC, SIP, or SNMP traffic to a p2p service inside of IPFS instead of natively over the internet. This approach allows for interesting failover, mobility, and encryption possibilities while also extending the usability of IPFS beyond storing data.
Technical specification
I haven’t given the technical details that much thought yet, so sorry for all the rough edges here. I just want to outline how it might work, not how it should work!
Redirects
Redirects are often used in web servers to move clients from old URLs to new ones or to move certain links to other locations.
IPFS could support feature this natively, to avoid that a web server has to be contacted to do the redirect before the client can upgrade to an IPFS-path.
An example of how the data stored in the DHT could look like:
<ipns-pubkey>
---
type: "redirect"
from:
scheme: "http"
authority:
host: "example.com"
path: "/old-link/"
query: ""
fragment: ""
to:
scheme: "ipns"
authority: "example.com"
path: "/home/"
query: ""
fragment: ""
valid-since: "2020-12-12T00:00:00Z"
valid-until: "2021-01-15T00:00:00Z"
...
<signature>
Wildcard URLs
If the DHT doesn’t contain a valid result for the full URL, the client might drop certain parts of the URL to find a matching entry - for example, the fragment-part might not be necessary to fetch this data from IPFS. As a last resort, the client can ask the DHT for entries just for the scheme and authority part of the URL, to find matching entries.
This opens not only the opportunity to specify the same information for multiple URLs but also to specify a 404-page if the URL isn’t valid.
An example entry for a redirect with URL wildcard:
<ipns-pubkey>
---
type: "redirect"
settings:
from:
wildcard-path: true
wildcard-query: true
wildcard-fragment: true
from:
scheme: "http"
authority:
host: "example.com"
path: "*"
query: "*"
fragment: "*"
to:
scheme: "ipns"
authority: "example.com"
path: "/404.html"
query: ""
fragment: ""
valid-since: "2020-12-12T00:00:00Z"
valid-until: "2021-01-15T00:00:00Z"
...
<signature>
This entry would be published in the DHT with the hash of the scheme and authority, to avoid having to publish this under every possible URL-hash
Here’s an example entry that identifies the source while ignoring the fragment part:
<ipns-pubkey>
---
type: "cid"
settings:
from:
wildcard-fragment: true
from:
scheme: "http"
authority:
host: "example.com"
path: "/welcome-page/"
query: "moreinfo=false"
fragment: "*"
content:
id: "QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB"
address-hint: [
"/ip4/6.7.8.9/tcp/46147/p2p/QmZHrtsCdrkfTkq56Q96vCbN16rEkzWogN7P58w9ytgWAj",
"/ip4/6.7.8.9/udp/47187/quic/p2p/QmZHrtsCdrkfTkq56Q96vCbN16rEkzWogN7P58w9ytgWAj",
]
valid-since: "2020-12-12T00:00:00Z"
valid-until: "2021-01-15T00:00:00Z"
...
<signature>
CID entries for URLs
As already seen above the content of an URL can be linked to a content-id while adding optionally address-hints to accelerate further network operations - if those nodes are online.
The simplest entry for a file stored on an FTP-server would look like this:
<ipns-pubkey>
---
type: "cid"
from:
scheme: "ftp"
authority:
host: "ftp.example.com"
path: "/demo-file.txt"
query: ""
fragment: ""
content:
id: "QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB"
valid-since: "2020-12-12T00:00:00Z"
valid-until: "2021-01-15T00:00:00Z"
...
<signature>
Note that specifying all parts of the URL is mandatory.
IPNS entries for URLs
Apart from permanently static files, the user might want to specify a dedicated IPNS-key to publish new versions of a file under the same URL without having to update the URL-DHT entries every time.
This type of entry allows just that:
<ipns-pubkey>
---
type: "ipns"
from:
scheme: "ftp"
authority:
host: "ftp.example.com"
path: "/demo-file.txt"
query: ""
fragment: ""
ipns:
pubkey: "QmSrPmbaUKA3ZodhzPWZnpFgcPMFWF4QsxXbkWfEptTBJd"
valid-since: "2020-12-12T00:00:00Z"
valid-until: "2021-01-15T00:00:00Z"
...
<signature>
Storing the information in the DHT
I think it might be best to create CIDs from these data, with something like a folder/file structure to make updates space-efficient even with many many elements stored under one hash and the clients having to update the data to fetch the next URL from the network.
This way the DHT could either be asked for the current CID or for the CID and the data in one request - if the node has zero information yet. This reduces the number of roundtrips necessary to fetch the first byte of content, while updates with many items would remain very efficient since the CID could be fetched via the regular network - having the DHT nodes holding the data temporarily like it’s pinned.