Encryption, private data, and private swarms with IPFS

danieln · October 18, 2022, 5:11pm

Using IPFS for user apps with user data typically requires encryption. Since IPFS does not natively provide encryption, this is something that developers need to find a solution for.

Encryption of user data is a broad topic, and with user-owned data, this is even more challenging as it typically requires some form of key management.

Tools, libraries and services offering encryption over IPFS

In the last couple of months, I came across several tools and approaches to this problem, so I’m trying to curate a list of tools, libraries, and services that address this.

Libraries and tooling built on top of IPFS/IPLD

WNFS is a filesystem built on top of IPFS by the Fission team. WNFS is pretty clever and uses a unique symmetric encryption key for each file and directory while encapsulating the encryption keys in the actual IPLD nodes. This concept is known as a Cryptree (more on this in this talk https://youtu.be/3se17NAS-Lw?t=435)
Peergos is another filesystem built on top of IPFS that also implements Cryptrees, the same pattern used by WNFS.
Ceramic created the dag-jose codec for IPLD which allows storing encrypted data.

Usability

WNFS is the only one I’ve tried, thanks to some help I got from the Fission team.

The nice thing about WNFS is that it works in browsers, and Fission’s work on WalletAuth means that the encryption keys for the private filesystem are derived from the user’s blockchain wallet,e.g. via metamask.

As far as I know, WNFS is agnostic to where you derive the root key from, and they also have an example that uses the WebCrypto API to generate non-extracable keys in the browser. In this case, when using multiple devices, each device has its key and access is delegated with UCANs.

APIs and services offering encryption with IPFS

Unlike the projects above, the following are hosted services that handle encryption for you. I haven’t actually tried any of these, but they seem suitable for trusted setups where you want to ensure data isn’t public by default, but are willing to delegate trust to the service to manage encryption (in a somewhat similar way to how you use Google Drive and Dropbox):

Fileverse an encrypted user storage and sharing app built on top of IPFS that integrates with cryptowallets.
Chainsafe files an encrypted user storage/sharing app built on top of IPFS
Lighthouse storage a hosted IPFS service with encryption. They manage the keys for you. They have an SDK which means you can use this programmatically in apps.

Private IPFS Swarms

Another thing that’s worth flagging is private IPFS swarms.

Private IPFS swarms do not encrypt any of the data, instead they limit network participation and communication to nodes that share an encryption key.

from the config docs:

It allows ipfs to only connect to other peers who have a shared secret key.

Some more details about private IPFS swarms from a GitHub comment:

Basically, everyone on the network uses the same symmetric key to encrypt all traffic (on top of the other encryption we do). This means you can’t join without this symmetric key.
Forward secrecy: connections are already encrypted and secured with a Diffie-Hellman handshake before they’re re-encrypted with this shared secret. So yes, it does have forward secrecy.
However, if you leak the secret key, anyone with access to the secret key can now join the network unless you rotate the secret key first.

Any services, tools, or libraries I missed? Let me know by replying! I’d love to know

danieln · October 21, 2022, 5:12pm

Another user app that I forgot to mention is https://fileverse.io/

reload · October 23, 2022, 7:41pm

WNFS seems really nice, thanks for sharing this is a great topic. Have you found any information out there on how to use dag-jose with kubo ? Saw this, haven’t found yet an example of how it’s done with IPLD.

The link to the ceramic.network blog post on dag-jose doesn’t open here (other links work), copying the URL manually works though, thanks.

danieln · October 25, 2022, 12:36pm

Just came across this guide and example repo. It delegates encryption to GPG.

danieln · October 25, 2022, 12:37pm

I haven’t yet, but I’m sure someone from the Ceramic team could maybe chime in.

MarkH · October 25, 2022, 6:49pm

Hi Daniel, the frameworks look very interesting, thanks for researching. Do you perhaps also know whether these frameworks allow to change the encryption?

We are looking for some ways to host private data on IPFS, and the ownership over the data can be changed. We are thinking of changing the encryption once the data changes from owner, thereby making it able for the new owner to access the data, while the old owner should be unable to access the data.

Looking forwards to your thoughts.

TJKoury · October 26, 2022, 11:18pm

My company is working on a solution that posts a “key file” to IPFS, which is a binary blob of asymmetrically encrypted keys that allow one to many / many to many encryption without payload replication. Each file is encrypted with a symmetric key, and that key is then asymmetrically encrypted using a few different schemes that allow one or many users to decrypt the symmetric key. Rekeying is as easy as reencrypting the file with a new key, then regenerating the key file based on the new rule sets / keys.

MarkH · October 27, 2022, 9:32pm

Very interesting. Do you have a link where I can find more information about it? You can also send me a PM if you prefer that.

TJKoury · October 28, 2022, 7:10pm

Some of the libraries we are developing are at:

Anything with “key” in the repo name. Unfortunately they are un[der]documented, but sometime in the near future we will be pulling it all together.

danieln · October 28, 2022, 7:20pm

Dropping this here following a conversation for further investigation:

Privy
Medusa
Arcana

dietrich · February 3, 2023, 1:07pm

This is pure gold, thanks so much for collecting resources here.

Broadly, seems like there are two options today:

Putting encrypted data on public IPFS network
IPFS private swarm (persistent, transient, whatever)

The former needs a comprehensive write up on risks, trade-offs and threat model - was a non-starter in some folks designing standards for personal data stores in the W3C+DIF collaboration group.

The latter needs comprehensive documentation and examples that are not solely Kubo. Eg you could have web pages on two separate computers connected over an E2E encrypted connection (websocket, web-rtc, etc) sharing unencrypted content-addressed data… and is that an IPFS private swarm if it never went on the IPFS DHT?

danieln · February 3, 2023, 1:26pm

That’s a good summary as far as my understanding goes.

Regarding IPFS private swarms, I thought it’s also worth pointing out that it also has many trade-offs that are worth considering. There was recently a proposal to deprecate pnet and the discussion includes many interesting insights about private swarms.

github.com/libp2p/specs

Proposal: deprecate pnet / PSK

opened 10:02PM - 01 Dec 22 UTC

marten-seemann

## What is pnet / PSK? Users can configure a PSK to create a “private network…”. This works by first encrypting the underlying TCP connection using Salsa20, and then running a libp2p TCP connection (i.e. TCP transport or WebSocket transport) on top of that. Specification: https://github.com/libp2p/specs/blob/master/pnet/Private-Networks-PSK-V1.md ## What is it intended to achieve? The [spec says](https://github.com/libp2p/specs/blob/master/pnet/Private-Networks-PSK-V1.md#security-guarantees) that nodes in different “private networks” should not be able to connect to each other. This statement has been interpreted in different ways, see for example the discussion around QUIC and pnet: https://github.com/libp2p/go-libp2p/issues/1432. One property that one _could_ aim for is that if a node doesn’t know the PSK, it is not able to learn that another node speaks libp2p (the way pnet works on TCP has that property). Another interpretation would be that we only care about the handshake not completing successfully,. A lot of times, pnet is used to make sure that nodes don’t advertise to the public IPFS DHT. ## What’s the problem with this? 1. It double-encrypts all data sent on TCP. This is slow. 2. Using a PSK for access control to any network (larger than a tiny number of nodes) has a lot of unappealing properties: There’s no way to revoke access, there’s no way to handle compromise of the PSK (which is virtually guaranteed given a large enough number of nodes), there’s no way to roll to a new key, etc. 3. It only works on TCP and WebSocket. It doesn’t work on QUIC, WebTransport and WebRTC (see issue linked above for our failed attempts to make it work). QUIC is the transport we’re optimizing for, and it already handles > 75% of the traffic on the IPFS network. ## What are the alternatives? * For building a private network that hides the fact that libp2p is spoken: use a VPN. VPNs are quite literally built for this. * For making sure that the IPFS DHT is not used: use a different protocol name for Kademlia.

The TL;DR of that thread (from my understanding):

Double encryption is inefficient
Key management of a symmetric key can be a pain as the number of nodes in the swarm grows
it isn’t supported by all protocols (only TCP and WebSockets)

Agreed

I think private swarm is the term for this specific feature. But I agree that there are other ways to exchange CID privately that we should maybe address at some point in the docs.

FWIW, we know that not all pinning services publish to the DHT but their content is still discoverable with the help of Bitswap WANTs.

danieln · February 23, 2023, 11:12am

Here’s another npm package that uses AES encryption to store files on IPFS

ianopolous · February 27, 2023, 9:12am

One thing I’d like to prefix this with: if you are writing an app, don’t use an encryption library that hasn’t been audited by cryptographers. Look for public audits. Also, if you’re doing anything non-trivial cryptographically, get it audited before telling people to use it.

I can also expand on Peergos, as you’ve mentioned us.

We implement cryptree+ which has much better privacy properties than the original cryptree. In particular, we don’t make ciphertext public, you need to auth to retrieve it. This is enforced with an extension to bitswap, which you can read about here:
https://peergos.org/posts/bats

This means we create a third category which is:

private encrypted data available with auth via the public DHT

We also have a unique sandbox that let’s anyone write apps using peergos, without needing to worry about storage, encryption, identity or access control. An app has a simple rest api which is handled locally in the browser using service workers. This means user’s own their identity and data, and can take data between apps easily. The apps are also sandboxed to protect against data exfiltration. The idea being that you should be able to run an untrusted app over your private data and not worry about it stealing the data, or tracking you. You can read more about this here:
https://peergos.org/posts/a-better-web

All our data is stored in ipld (dag-cbor). You have 2 encryption keys for each file or directory (one for the data, and one for the metadata). We have a super cool feature which is zero-IO seeking within a file. This means that you can seek within an arbitrarily large file without doing any intermediate IO. This is achieved without the server knowing which encrypted chunks are part of the same file.

Peergos works in browsers, and the browser client treats the server as untrusted - verifying hashes and signatures on anything it receives. You can log in through any device. Your keys are derived at login.

You can read more in our tech book:
https://book.peergos.org

If you want, I can get you a free account on peergos.net to try it out, just email me.

0xVikasRushi · March 2, 2023, 4:58pm

@ianopolous. Thanks for Suggestions regrading ipfs-encrypted Node.js module. I love to know more about peergos

Yudai · June 4, 2023, 4:54am

I have written about some of the quetions I had as I was reserching Cryptree.
Problems-with-access-control-mechanisms
I want to discuss.

Topic		Replies	Views
Private/personal use of ipfs? Help	15	6037	January 25, 2019
How can IPFS distribute dynamic content (private, server side, user-specific content like passwords)? (WIP) Help	13	4656	May 23, 2017
Dynamic website with user functionality Help	25	9311	April 9, 2018
IPFS-based manufacturing execution system Ecosystem and Usage go-ipfs , use-cases-and-apps	18	2554	January 11, 2018
Newbie questions	35	749	June 10, 2022