How can I be certain that my IPFS node is not hosting content X short of `rm -rf ~/.ipfs`?

What I did?

  1. ipfs add -w sensitive.pdf
  2. pin the file on pinata.cloud
  3. actually look at the file and regret, unpin it on pinata.cloud
  4. How can I remove a directory and all its contents?

What is the deal with sensitive.pdf?

I volunteer in an association. We had an issue that our rules in service Y are not up-to-date and in our discussion channel there was the sensitive.pdf uploaded (not it’s real name by the way) and as I needed a publicly accessible place to host it and as the file is not supposed to be change and I thought it was supposed to be public, I uploaded it to IPFS.

When I looked into it closer, I found it to contain full name, street address, phone number and email address of two of our activists, so it would have been sensitive and violating privacy laws even before GDPR.

How do I wish to go forwards

I understand that if the content has been pinned by someone/something(g) else(s), it will be on IPFS permanently, but I still wish to know as surely as possible that my IPFS nodes are not hosting sensitive.pdf (without having to rm -rf ~/.ipfs which is the nuclear opinion and would take down a lot of non-offending content also) so in case there were legal consequences, I could say that I have done what I can towards taking the offending content down. I don’t think legal consequences are very likely though as the sensitive.pdf looks formal and makes me think it may be available from a public registry for a fee.

I also find the knowledge of what to do in this case important for the future as I doubt I am the only IPFS user ever to make this mistake, even if I may be the first to come out on what I did wrong in so much detail.

I have heard of content blacklists, but I don’t know if GDPR one exists yet and I would like to avoid that as it would be a source where to find personal data, but in case I change my mind, it would be nice to know of where it is.

How do I know the content is still online?

I originally uploaded and (hopefully) unpinned it three days ago and have been infrequently checking it through Tor Browser (definitely doesn’t have IPFS Companion installed) and ipfs.io gateway hoping to see something else than the file to hint that it has been GC:ed and is out of the network.

Make sure to unpin the CID and make sure to remove the file additionally from the MFS (files in the GUI).

The run ipfs repo gc. This will remove everything from your node which is not pinned and not part of the MFS (files).

Running GC might degrade your performance and will take a long time, if you have a large number of pinned / unpinned content in your repo.

After completing this, you can check the DTH if there’s still someone providing the CID:

ipfs dht findprovs $CID

Don’t panic if there’s a provider found, check if that’s one of your peers, since the DHT doesn’t know about your garbage collection run, they might still show up.

2 Likes

ipfs pin rm <cid> if you haven’t yet. If it’s on MFS, ipfs files rm <path/to/file> as well.

ipfs repo gc to clear up your datastore from any blocks.

ipfs refs local | grep <cid> should NOT show anything.

ipfs dht findprovs <cid> which show what peers in the DHT are still providing the file, if it can find any.

You can write to abuse@ipfs.io indicating the CID and it can be blacklisted from the gateways in case it is cached. Otherwise it should be wiped out after a few days, assuming no one is requesting through the gateway it and no other peer is providing it.

2 Likes

Thank you for your fast responses,

Error: not pinned or pinned indirectly and it’s not in files.

It shows nothing, so I take it that it’s not my node which is a relief.

It seems to be taking a long time, so I hope it’s not finding any.

Thank you, I will, presuming the blacklist is not publicly available.

I guess this is my problem, I was somehow in belief that the cache gets wiped out after two hours.

Thank you again

The cache gets never wiped as long as there’s enough space left. The idea is, that you probably need the most stuff more than once. So when you request it one time, you can help to spread the content until you need it the second time.

Standard settings are, that your cache will be garbage collected when you hit 90% of usage of the set storage size.

Standard storage size is 10 GB.

If you use IPFS just lightly to cache IPFS websites and share some smaller files now and then, you probably won’t it the garbage collection limit for a long time - which is a good thing. :slight_smile:

On the gateways the settings might be different.

Edit:

Yeah. Actually I don’t know if it will return at all, since there might be no timeout for this operation.

As long as it doesn’t show up a hit, there’s noone providing this CID. So maybe check later again, just to make sure, that the computer wasn’t online who got the CID.

1 Like

It did come empty :slight_smile:

Alright :slight_smile: