IPFS Records for URN/URI resolving via a DHT

This is a proposal to implement an additional DHT for Content-ID labeling. The goal here is to make URIs in general and URNs in particular resolvable to a CID. This will IPFS-clients able to ask the network for matching Content-ID(s) for a known URI/URN.

A record in this DHT is signed by an authority, by default either an IPNS-key or a Node-ID. While this alone isn’t establishing any trust in the record, there are several ways to establish a trusted anchor to the record.

One way is to include the Node-ID or the IPNS-key in a DNS record as an authorized key to publish IPFS records for this domain. As a practical example the IETF could publish an RFC under the URN urn:ietf:rfc:1149 and sign it with an IPNS key. The IPNS key can be shared between multiple nodes, avoiding a single point of failure. The IPNS key would be added to the DNS system, allowing DHT nodes to verify the source individually by verifying the signing key of the record with DNS requests. This measurement is just to avoid flooding of the DHT with false entries.

Example

The node interested in resolving urn:ietf:rfc:1149 is now hashing the string, and sending a request to the network. If the node is only interested in a certain type of verification, like DNS or a certain domain, this can be added as a filter, to avoid having to transfer large amounts of records which will be discarded anyway.

The DHT will respond with the matching record(s) and the client can ask his DNS server for the public key of the domain ietf.org to establish the trusted anchor.

Possible applications

Since the authority publishing and the string identifying the record are decoupled anyone can publish any string to be searched for. This allows for example to resolve digital object identifier, bibcodes but also regular URLs. This is useful for archiving purposes. Where a original source might go offline, but IPFS keeps a copy of the content for historical purposes - a prime example for this is archive.org which could publish the URLs they offer with a signature and store it to the DHT.

As a domain owner you could obviously also publish URLs for your domain in this DHT, if you migrated to IPFS and don’t want to offer an HTTP server anymore. This would allow browsers to resolve any old http links to your domain.

Web-Of-Trust / ION integration

On a person-to-person level those records could be used with a web-of-trust where persons exchange Decentralized Identifier into their contact list and use them to filter the records received from the DHT - the prime example here would be ION which is build with IPFS and Bitcoin. This allows without having to rely on any external system to publish CIDs under human-readable identifiers between persons.

Metadata of records:

  • Identifier in clear text → Must match the hash
  • Record valid from date
  • Record valid until date
  • Epoch number (unsigned integer as version number)
  • MIME type
  • IPFS-Datatype
  • Size
  • Encoding Quality (string)
  • language in ISO 639-1, -2 or -3
  • Content creation datetime*
  • Content capturing datetime*
  • Content release/publish datetime*
  • Content last modified datetime*
  • Geo-Location: lat/lon
  • Altitude
  • Attitude
  • Projection, coordinate system, ellipsoids etc GeoTIFF-Style
  • Title
  • List: Author(s)
  • List: License(s) with their corresponding CID
  • List: Content-Hashtags
  • List: Parent-Objects Record-Identifier
  • List: Alternative Versions Record-Identifiers
  • List: Quoted Sources Record-Identifiers
  • List: Related Objects Record-Identifiers
  • Signature methods:
    • Self
    • IPNS
    • DNS
    • ION
    • […]

Mandantory are bold.

* String to allow for scientific dates

Reverse records

Records could optionally also be published in the reverse direction. This allows CIDs to be resolved to identifiers and authorities. In this case, the DHT already needs to have the forward record with the same metadata and a clear text identifier stored. The DHT nodes which are requested to store the reverse entries will verify this before storing and serving the reverse entries.

Interface

As a simple interface, pins on the ipfs-webinterface could have these records as tags. So a user could add attach them and select how he likes to authorize this information. This would allow IPFS to store tags on pins decentralized and not only locally on a single node. Without changing the CID.

If the user chooses to add a reverse record other nodes could optionally fetch those tags automatically while fetching a CID.

Flooding prevention

Since there’s some overhead on the DHT clients to process the received records for publishing we need a way to have clients who want to publish stuff to “pay” for this with processing time - to avoid flooding and DoS.

As a possible solution we could generate a list of Argon2 hashes for random strings on each DHT node and add new ones/remove old ones from time to time.

If a client wants to publish a record, the DHT node would send him a challenge-response to let him calculate the Argom2 checksum for a random string from its list.

This way publishing records isn’t cheap for the clients and will cost some processing time and memory usage for a while.

1 Like

Potential users/use-cases: LBRY

Lbry/Odysee states in their FAQ that IPFS is certain features they need and that’s why they are using a dedicated network which can offer this feature.

Lack of discovery

With IPFS records channels could be implemented by having an identity sign a record under “lbry:channellist”.

While the number of entries would be very large it would be fetched only once per day or so into a local database and queried from there.

Every e.g. ION identity can this way create a channel without any way to censorship this.

The CID/IPNS in this case would point to a current list of videos, which are referenced by their IPFS Record identifiers.

No identity

ION does exactly this. And IPFS Records makes it possible to cryptographically publish stuff via an ION identity.

Ugly URLs

IPFS Records can be used to resolve beautiful URIs/URNs to be resolved to CIDs.

Potential users/use-cases: IPFS-Search.com

A search engine can use the additional meta tags to either present or index the data better stored on IPFS.

If there are ipfs resources linked in documents and websites it the search engines can use the reverse entries to find the corresponding meta informations.

Potential users/use-cases: The IPFS-Companion

CIDs on the web

Links to CIDs/IPNS are currently without any verification information or information where the link will be heading.

The IPFS-Companion could fetch on-the-fly all available Informations via reverse lookup of the ipfs records and show which entities have signed which meta-informations about a CID in a mouse over-popup.

The companion could also let the user choose a name from the meta infos to store the CID on the disk/in the mfs with something more useful than a CID.

Archiving of Websites

Users could save websites via the companion and publish an IPFS record with the URL of the website and the date of the fetch.

If the website goes down the users could choose older versions of the websites provided by other users or entities which have signed it.

Linkify Identifiers

The IPFS-Companion could search the IPFS-Records on-the-fly for certain URNs/URIs when found in the text of a website, like it does currently for CIDs.

Potential users/use-cases: Internet Archive

The Internet Archive could publish and sign records for URLs and URN/URIs they index. This way the IPFS-Companion app could find older versions of a website if the content changed or the site has gone down.

Potential users/use-cases: Wikipedia

Wikipedia could publish their articles (as Wikitext) via URNs like “urn:wikipedia:en:Mainpage”. Via the epoch-number the individual versions could be easily sorted or filtered on a query.

With ‘Related Objects’ the linked media files and templates etc. could also be saved in the meta-info.

This way a program could easily fetch all necessary media files and the wikitext for an article without having to parse the article first.

This way the rendering latency can be minimized.

Via the URN ‘urn:wikipedia:en’ the Wikipedia could also publish a full list of all articles, this way a client could fetch them daily to update its search-index.

Potential user/use-cases: Linux Foundations sigstore

The newly announced project sigstore could leverage IPFS Records to publish signatures for unique identifiers for software, like ‘urn:sigstore:go-ipfs:0.8.0:binary:linux:x86_64’ or under the download URL.

The CID stored there would allow to

  • Verify the content
  • Download the file(s) via IPFS if the known source is down
  • Do a reverse search to find other signing parties for the file.

The last point allows the user to rely on multiple trusted anchors, e.g. an ION signature of the author and a OpenID check by the Linux-Foundation as well as a review of the software by Red Hat.

@lidel, @stebalien so what you think? :slight_smile:

TLDR: IMO working with data that’s not self-certified is a bad fit for IPFS. Nonetheless, you are actually able to implement your scheme on top of the existing IPFS public DHT without any modifications.


@RubenKelevra IMO asking a public resource (volunteer DHT server nodes) to store arbitrary amounts of data and/or perform arbitrary computations to ascertain which record is “more valid” when two users submit competing records for a key is problematic.

In my experience when people look at the DHT they see two things:

  • A way of coordinating and finding other peers with some shared interest (e.g. IPFS content, IPNS records, PubSub topics, etc.)
  • Storing data for a little while, e.g. a dropoff point for data between two nodes who are not online at the same time (e.g. IPNS records).

IMO the second utility is a side-effect of the DHT and should be kept to a minimum to prevent abuse.

If so then your proposal can be implemented today without requiring any modifications to how the IPFS public DHT functions. I described how you might do this here, it’s basically to just do what IPNS over PubSub uses.

cc @petar, you might be interested in this proposal.

@adin sure, this would increase the amount of data stored in the DHT, but on the other hand those records are tiny, basically the same size as a list of IP addresses and ports a node shares when it comes online.

Maybe you misunderstood something here. There’s no selection what’s “more valid”. On a push to the DHT the nodes receiving will try to verify the anchor to avoid storing obviously wrong informations.

Since this is costly, the nodes pushing need to answer a costly challenge-response from a list of challenge responses which the DHT node generated in advance.

Once the request to insert data is “payed for” by providing the right Argon2 checksum for the challenge and the data verification is completed, the data will be added to a list.

The querying nodes will just get all entries and need to make the decision which one is valid.

The data transfer is obviously unnecessary, so simple filters like “only dns anchors” and “only from domain archive.org” makes sense for requests. The processing time for these filters should be minimal.

Right, but most operating systems these days have a powerful search engine built-in, to find the stuff you’re looking for.

This feature is completely lacking in IPFS, there’s no way to put distributed metadata to a CID and there’s no way to sign this content in machine readable form.

With IPFS records the user can fetch metadata for a CID he/she has stored, and select from the “providers”, like different domains.

This way a user can search locally in the data. Network searches, except for specific URIs/URNs would still be reserved to search engine projects like ipfs-search.com.

Tried to think about the value added by having URI2CID lookup over DHT, and proposed system feels off to me on multiple layers (trust, compelxity). I am struggling to articulate why, but from the top of my head:

  • I feel I don’t understand how the trust anchoring is supposed to work.
    • Publishing “URI2CID mappings I have no authority over” is allowed.
    • The authority problem is brushed over, and relies on vague “web-of-trust/ION” solution(s) that will take care of trust anchoring.
    • This is a huge, complex rabbit hole. Without solving this, entire proposition is not useful.
  • Examples look a bit like solution looking for problems, but don’t want to derail this topic by going into details, as the trust anchoring is the biggest problem.
  • Ignoring the trust problem, it still has limited utility in real world. Each URI type requires custom resolver to be useful. This means implementing this on application layer per URI type is way more practical (you can then model trust anchoring is a way specific to URI type, which sounds way more realistic than “generic web of trust”).

URIs can’t be self-certified. Due to dependency on a third-party validator (vagueness and fuzziness of “web of trust”) I agree with @adin that this is not a good fit for IPFS core protocol, but if someone wish to explore this further to find answers to trust anchoring problems, a proof of concept can be built on application level (running on top of IPFS node).

Let’s break the trust anchoring down on an example from Wikipedia:

  • User A is organization IETF and wants to publish the RFC2648.
  • User A adds the RFC as Textfile and as PDF to IPFS.
  • User B wants to read the RFC2648 from IETF.
  • User C wants to read the RFC2648 from IETF, but the IETF isn’t publishing this RFC anymore.

Without IPFS Records:

  • User A now got two CIDs which are not self-describing, not human readable and not signed which need to be shared somewhere on a website by User A for User B. This is hardly machine readable.

With IPFS Records:

  • User A creates an ipfs-key (currently used for IPNS)
  • The key gets added to a domain record like TXT _ipfs_records_authorization.ietf.org
  • User A fills in the metadata for each of the files
  • User A selects the ipfs-key for signing the ipfs record

The node of User A will now contact DHT nodes and send them the signed record, with the domain it is signed for.

Each of this DHT nodes will ask their DNS-Server for the TXT record, check the signature of the record with the published public key and save and start to serve the record once the signature is validated.

(the flood prevention step is excluded here)

The node of User A will now publish the reverse records as well, the DHT nodes responsible will just check if the forward record exist.

Now we got a URN (urn:ietf:rfc:2648) with two CIDs (one for the pdf one for the txt file) published.

This is human readable in comparison to an CID and the trust anchor in the domain allows to check for its authenticity.

User B

If User B now wants to read the RFC2648 from the IETF he/she let their IPFS node resolve the URN (urn:ietf:rfc:2648) with a filter for the verification type DNS and the verification authority ietf.org.

The DHT nodes now respond with their content for this record, so they should return two records: one for the PDF one for the txt file.

The node of User B now asks its DNS servers for the TXT record for domain ietf.org and checks the signatures for both entries with the published public key.

Both entries are then returned to the user who can fetch and view the CID.

User C

A domain called archive.org fetched the RFC while it was available and is still publishing the record for it, with its own signature.

User C wants to read the RFC2648 from the IETF, but the IETF isn’t publishing any record for it anymore.

Since User C isn’t getting any result for the filters User B used, he/she removes the domain name filter and gets two returns for the verification type DNS which are signed by the domain archive.org.

User C can use the record from archive.org to find the right CID (if he/she trust archive.org to not ship forged data) to resolve the URN to the corresponding CID.