A straightforward censorship attack against IPFS

I’ve recently started wrapping my head around IPFS’s Kad-DHT implementation, and I’ve noticed what appears to be a fairly straightforward censorship attack using Sybil identities. If I’m understanding correctly, pointers from an IPFS CID to IPFS peers who have that content available, are bucketed with the 20 peers whose key hash most closely matches the CID hash, according to the xor distance metric. This I fear is too simplistic, because it is too easy for a single adversary to control all 20 peers for any chosen CID.

According to IPFS KPIs | ProbeLab, there are currently about 15k IPFS servers which are “online” or “mostly online”. Therefore, an adversary who generates 20 * 15k = 300k identities will likely have some 20 which are closer to any chosen hash value than any existing server on the network. Establishing a new server is as simple as generating an Ed25519 key, which according to Measurements of public-key signature systems, indexed by machine takes around 43k CPU cycles. Therefore, controlling all 20 servers for a chosen CID requires only 20 * 15e3 * 43e3 = 12.9 billion cycles, or just a few core-seconds. An adversary can then censor that CID by pointing the client to some tarpit servers that send bogus content as slowly as the client will tolerate, or simply by reporting that they do not know of the content at all.

I am not suggesting that this attack is particularly novel; Sybil attacks against DHT protocols are a problem well-known in the literature. In 2020, [2011.00874] Total Eclipse of the Heart -- Disrupting the InterPlanetary File System described a Sybil attack against peer routing in order to isolate particular peers from the network or cause broader disruption, and libp2p implemented some mitigations. DIAL : download document, a masters thesis which seems to have been sitting around in obscurity for a couple years, also discusses attacks similar to this one. However, I can’t find any previous discussion in which someone has pointed out the arithmetic of just how cheap it is to censor any single piece of content from the network or suggested any practical countermeasures.

I agree with most of the analysis. This is a well known problem in almost all DHT implementations.
However I’m not sure I would call this attack straightforward:

  • the DHT has some hardening against one computer having a lot of PeerIDs Code search results · GitHub I don’t remember the exact details and thresholds, but you might need to be able to run your attack from a handfull machines in different ASes to be able to control the 20 final peers.
    • I am not sure if this has been implemented for the AccelereatedDHTClient which is used by some publishers who have lots of data.
  • the DHT isn’t the only way Kubo find content:
    • Kubo now ships with https://cid.contact/ endpoint by default. This uses a completely different system, the query itself is centralised, so you would need to find an other attack.
    • If downloading takes too long (~1s if my memory is correct) Kubo will start broadcasting bitswap WANT_HAVE request to all connected peers, this “works” in more situations than it looks it would work. This is very hard to censor without a complete routing table takeover which should have been made really hard to do since 2020 altho it’s also unreliable.

I am not maintaining this code anymore, my understanding of this problem at the time is that this a treadmill problem and the current situation should be pretty good with combined DHT + Indexers and if turns out it’s not then I would have improved the situation even more until whatever attack were being used / presented to me wouldn’t work anymore.

To your first point, that doesn’t seem like much an obstacle considering how cheap it is, through various cloud providers, to spin up a bunch of small VMs in geographically diverse regions, or to connect one machine to a bunch of different VPN relays. Your second point is more convincing. These different methods each have their own weaknesses, but the intersection of the sets of real-world adversaries willing and able to exploit those respective weaknesses is a lot smaller than for any method individually. So it’s a solid defense in depth.

Following you guy’s progress now… I put a few links below this post;
I was reading into this yesterday, here’s some supporting docs on the subject. Mind you, they have a little dust on them at this point in time. I was oblivious to the kad implementation other it obvoiusly showing on the peers screen, because I have enough problems trying to figure out why my companion node runs like a champion, unless of course I try to interact with it through brave lol. And back to the censorship topic here… I was able to confirm, only about 3 weeks ago, my winDOHz cliient has been blocking but not anouncing a block on any shell/terminal activity that isn’t run through the active VS build I have. Since Microsoft does offer great built in, and cloud based security, that would be much appreciated had they thought to inform me at ANY time or even in VS. I’m already a noob with anything involving building from source, or active debugging at the command line, I’ve ruined at least 3 ipfs desktop packages trying to figure out why the system has no idea what “ipfs” or “ipfs daemon” is when I’m nestled up right in line with my already established path.
So I can’t help but think this is some sort of shadow censorship game they’ve got going. But through all this banging my head against the wall, I can offer this… Using identity based private keys for a handful of ipfs nodes, and relaying those locally with my original native node.js as Big Pappi of course, is very secure. I have one that set up that drives all my “open in” add0ns through their browser extensions. The OS, secure identity related apps, and even my vpn clients have been debo’d from startup on when they try to move across the browser’s edge by Node-nado, where they had well established routes prior lol. So as you can imagine, before I made more custom rules in my cloudflare config, servers my browser was trying to do it’s normal hanshakes with, were refusing to connect, or getting shelacked by my browser nodes. I’ve got them set up to utilize udp as much as possible locally, completely custom, and node independent ports, and they switch ports through the transition to tcp. I wanted that to happen before async but I had to settle for remote vs local port mapping.
It’s pretty resource intensive, but I haven’t seen any evidence of unwanted connections or slick pirates taley hoe’ing my node lads on the open seas. I mean like zero. There’s also really strict and well established rules for my network adapters that won’t allow ipv4 connections to the local lan, if it’s used outside the gateway for http routes, or vice versa, if that adapter is using ip6 on the interwebs, it must switch in house.
So, sorry about the long story… But! I may not know what the hell I’m doing half the time, but I’m positive this method I’m using is an unnatractive target to whatever flocks of kad botnets might be roaming around looking to cause disruptions or mosey on in and take a commanding stance with my happy little swarm.

Relaying with multiple adapters, that all reverse proxy inbound routes. It’s pretty solid. :smiley: :smiley: