The IPFS DHT Reader Privacy Upgrade

yiannis · April 14, 2023, 5:14pm

One of the long-standing requests of the IPFS and libp2p communities has been to add more privacy in the interactions between different network entities at the protocol level. We have heard the community, and the IPFS and libp2p teams at Protocol Labs EngRes have been working on an elegant approach to libp2p privacy, which, of course, is going to be integrated into IPFS. The privacy approach we’re developing won’t cover the entire surface of the IPFS protocol stack or the content routing options available, but it is covering a very central component of content routing in IPFS, that is, the IPFS DHT. The approach also doesn’t cover all forms of interactions between network players and is primarily focusing on “reader privacy” for now, i.e., the act of requesting content from the IPFS public DHT network. “Writer privacy”, the act of publishing content to the IPFS public DHT network has not been a design goal for now.

In summary, the scope of this DHT Privacy Upgrade is:

In scope:

DHT reader privacy - , i.e., the act of requesting content from the IPFS public DHT network.

Out of scope:

DHT writer privacy, the act of publishing content to the IPFS public DHT network.
Other content routing options like IPNIs (although there are efforts underway to improve reader privacy using similar techniques for IPNI that you can track here.)

IPFS DHT Reader Privacy Upgrade - Summary

The approach that we have been developing is based on double-hashing of IPFS CIDs. Although the details are still being ironed out, we wanted to give an early heads up to the community to receive feedback and start the discussion. Here is a summary of the double-hashing design rationale. More details can be found at the spec.

In a double-hashed DHT, Content Providers are not publishing a provider record for the plain CID, which everyone can request, resolve and associate the PeerID of the requestor with the content itself, as well as with the PeerID of the provider hosting the content. Instead, Content Providers publish a hashed version of the CID. Given that the CID, or the Multihash to be precise, is already a hash of the content, the approach has been called “Double Hashing” until now. The double-hash DHT provides the following user benefits:

Provider records are encrypted: in the “IPFS DHT Reader Privacy Upgrade” provider records are encrypted by the content provider with the CID of the content. With encrypted provider records, an intermediate DHT Server who is asked for a provider record cannot associate the PeerID of the client requesting the record with the PeerID of the provider storing the content.
CID replay is not possible: in the current DHT design, any DHT Server node can replay requests for CIDs they have heard of and identify which peer has requested which content (i.e., CID). They can also find which content provider is serving the request. In a double-hashed world, the intermediate DHT Server can still replay the request (for the double-hashed CID), but they can neither read the provider record content nor retrieve the content (since they can’t read the record) and therefore, cannot link clients (i.e., requestors) to content. They also cannot see what the content actually is.
Prefix requests are made possible: in the current DHT design, clients have to request for the entire CID (i.e., the kademlia key). By turning to double-hashed CIDs, clients have the opportunity to ask for a prefix of the double-hashed kademlia key. This results in multiple provider records matching the prefix and multiple provider records being returned to the client. This doesn’t make it impossible, but certainly makes it significantly more difficult (see k-anonymity) for an intermediate DHT Server to associate the double-hashed CID that a client is requesting with the client’s PeerID.

Clearly, this is a breaking change to the way the DHT works today and therefore a migration plan needs to be in place. Our current line of thinking is the following.

Migration Plan Summary

We hardcode an IPNS key into any IPFS implementation that will participate (e.g., Kubo) and orchestrate the nodes that upgrade to that release to be requesting this key periodically. Simplistically, the IPNS key includes the switch date and time. This way, everyone gets to “hear” about the switch date, given that they fetch and check the record regularly. On the switch date, everyone needs to follow their migration plan, depending on their role(s) in the network (e.g., DHT server, DHT client, Content Provider). There is also a Transition period, which is a period of time when peers in the network will have the option to use both DHTs.

Some example migration plans for each of the players are as follows:

DHT Clients: are able to retrieve content from both the old and the new DHT for the transition period. They prioritise the new DHT in order to maximise lookup privacy. There are different options available for DHT Clients, which will be considered and communicated in the next iteration of our migration plan.

DHT Servers: run both DHTs for some period of time, until they deprecate the old DHT and continue with the new one only. This is so that they can store and serve records that have been provided to either the old or the new DHT.

Content Providers: by default they switch to the new DHT, but have the option to stay on the old one too, if they do so manually. Operating on both DHTs means that they have the extra load of publishing the same content twice.

Callout

All of the above is subject to change, although the PL EngRes IPFS and libp2p teams have already put significant effort to finalise the spec and the implementation and we believe we’re on a good track to keep most of the above as discussed.

With this post, we would like to call for reviews and contributions to the IPFS DHT Reader Privacy Upgrade. There will be more communications, articles and calls for contributions, but this post will be the point of reference as we will link to all of the updates from here.

The Double-Hash DHT and the migration plan will be a topic of discussion during the upcoming IPFS Thing in Brussels, 15-19 April 2023. If you are there, come talk to us.

Apart from continuing the discussion in this post, other ways to get in touch are the well-known ones:

IPFS Discord Server [invite link]: #libp2p-privacy, #libp2p-implementers, #probe-lab
Filecoin Slack [invite link]: #libp2p-implementers, #probe-lab

Rough Timeline

(Subject to significant changes, depending on all the moving parts and shifting timelines of the different teams involved.)

Spec is finalised
First draft of migration plan - this post.
Wider announcement of plan to move to double-hashing (IPFS Blogpost) - end of April 2023
Second draft of migration plan - end of April 2023.
Implementation and testing finalised - ETA announcement: 2023-04-30
Final migration plan (if changes needed after implementation complete) - ETA announcement: 2023-04-30
kubo release that includes Double-Hashing - ETA announcement: 2023-04-30
Migration triggered - ETA announcement: 2023-04-30
Migration complete and old DHT deprecated - ETA announcement: 2023-04-30

Useful Pointers

Spec: [GH IPFS spec repo] - source of truth and most up-to-date
IPFS Thing 2022 [video] - might contain outdated information
IPFS Camp 2023 [video] - might contain outdated information

Privacy with other Content Routing options

Note that this post and effort is focused on the public IPFS DHT. Other content routing options like IPNIs are not in scope. There are ongoing efforts for adding reader privacy to IPNIs here.

hsn10 · April 23, 2023, 12:53pm

This will increase CPU load. I assume that very large installations with CPU maxed already will continue to run old DHT version for years.

Because of this, I propose to pack more DHT changes into DHT2 upgrade package:

Analyse what crypto is Tor network doing in v3 services for addressing v2 weakness - collision and enumeration attacks.
Because this change effectively disables ipfs-search.com there needs to be support in kubo to submit all locally stored CIDs (probably unencrypted raw) to index provider signed by node key. Some standard protocol needs to be defined.
Are some DHT performance improvements possible? They should be part of this breaking upgrade and its worth to wait for them.

For example one way how to reduce DHT traffic might be have user defined DHT lifetime in config. node will announce hashed value + expected lifetime. DHT server will use it to do less frequent cleaning of outdated records. For example if I know that I run GC every 3 months, I can announce my CIDs with 3 months lifetime.

DHT server can implement smarter cleaning. Daily check if announcing node is still alive by doing DHT lookup - if not, drop everything what we have from this node.

Topic		Replies	Views
How a Hypercore P2P innovation could bring more privacy to IPFS Related News privacy	1	436	January 25, 2022
Indirect content addressing for DHT privacy	8	573	May 13, 2020
[deprecate] IPFS ❤️ human readable URLs - Use a DHT to upgrade URLs to IPFS-CIDs Protocol content-discovery	8	1210	March 9, 2021
What implications do IPFS have on personal/device privacy on IPFS? Ecosystem and Usage	2	1503	August 19, 2017
Does IPFS provide any guarantees about anonymity? Help	6	3334	February 15, 2020