One-to-one private data exchange via IPFS

Hi there,
I’m thinking of new concept, one-to-one private data exchange client through IPFS file system and trying to implement now. But I got stuck in some dilemmas.

Let’s think about the following scenario. Assume that two people(data seller, data buyer) want to exchange large data(called data A) via IPFS. Data A is classified data, so it is prohibited to be exposed. A data seller upload the whole data to IPFS network and send the IPFS hash to data buyer. The buyer download the data A through IPFS hash. The only people who knows the data A’s hash address are data seller and data buyer. After downloading the data, buyer and seller are immediately calling gc(garbage collection) to eliminate data A in their local IPFS storage. Maybe there are no remaining data block related to data A in IPFS network because no other peers are connected(nobody knows the hash address). In this situation, I have a few questions:

  1. Is there no possibilities that data A be exposed(or downloaded) by someone after eliminating their(seller,buyer) own local storage?
  2. Data seller and buyer only use IPFS as their private exchange. Is this situation would be harmful for IPFS network?(not illegal, but seller and buyer would be regarded as leeches?) And is it desirable for network?
  3. By bitswap protocol, some part of data portion will be spread to the IPFS network. In this situation, is there any possibilities to restore the whole data A?(Because it’s a private data, it is fatal to expose outside)

Thanks for your opinion in advance.

In case you’re not aware of them, there are two currently experimental features that might be of interest to you.

  1. ipfs p2p
  • Allows tunneling TCP connections through p2p streams between peers.
  1. private networks
  • Allows private IPFS networks to be set up where only peers with a pre-shared private key can connect to each other

For your questions:

  1. If you added the content to your node while connected to the public swarm, then there’s a small chance someone could have downloaded at least some of the content you added. This shouldn’t be the case if the buyer and seller are in a private swarm, the data is added, data transferred, and then gc is run on both nodes.
  2. Private networks (#2 above) aren’t inherently harmful to the network. Trying to transfer “private” data using the public network seems unnecessary.
  3. Maybe, but if you use a private network for the buyer and seller and you don’t have the data in your repo while connected to the public swarm you should be good.
1 Like

Somebody could listen in on DHT traffic and just download all hashes it gets. Then he would be able to listen in on whatever is happening in the network. It is still reasonably unlikely for somebody to be able to get all the data, but if you want the transfer to be really secure you would have to use encryption. Which is not a big deal, since encrypting/decrypting a large chunk of data using a stream cipher is way less expensive than sending it over the wire.

You just need some way to safely transfer a small amount of private data (the key for the stream cipher), but you need that in any case.

Even if buyer and seller only transfer data that is private and thus not of use to anybody else, they participate in maintaining the DHT, so if they are online permanently they perform valuable service for the network. At least that is how I would see it.

Very theoretically, yes. But if it is encrypted it is not useful for anybody.

1 Like

@leerspace Thanks for your kind answering. I’ve got a clue that I was wondering! I have a question more, it’s about private network,

  • Assume 3 nodes(node A, B, C) share their private key. And only two nodes(A, B) register each other’s multiaddr using bootstrap command. In this situation, How can node C find other nodes’ address in private network?

Encryption before adding files to network is the key feature. Thanks @rklaehn !

It can’t. Node C needs to know about at least one of the other nodes. Probably the easiest solution is to have one node act as the bootstrap node, then share its one multiaddr with any other node that wants to join the private swarm. In this scenario let’s say that Node A is your bootstrap node. When setting up Nodes B and C if you tell them each about Node A’s multiaddr then they should all eventually be able to learn about and connect to each other.

Perfect explanation, I understand totally. Thanks!