TL;DR
R.I.P. DHT Hydras
Recently, it was found that Hydra peers were declining in quantity, and ultimately on 2023-04-20 the last Hydra node stopped providing network bridging to IPNI (the InterPlanetary Network Indexer). Internal studies show that Hdyras provided less a ~10% performance gain in the IPFS DHT network. Given the substantial cost of Hydra’s operation, Protocol Labs conducted a "Hydra Dial Down" experiment in December, 2022. The result of this experiment showed low risk and impact, and therefore, we have decided to leave the Hydras non functional replacing their bridging function with a direct path to IPNI on 2023-05-04 from popular gateways like ipfs.io/dweb.link.
The exciting news is that allowing the Hydras to drawdown was largely unrecognized by the majority of network participants due to the effective redundant content routing systems now in place. In fact in some areas network performance may have even improved!
Pointers
- Results of the ProbeLab Hydra network performance impact study: https://github.com/protocol/network-measurements/blob/master/results/rfm21-hydras-performance-contribution.md
- IPFS Camp 2022 Talk on “The IPFS Network From The Hydra’s PoV”: https://youtu.be/zhzxJGoLTg0
- Hydra dialdown discussion from Content Routing WG April 16, 2023 Content Routing WG #9 Brussels Hydra drawdown discussion
Details
Hydras have been serving three primary purposes, but when compared with the cost of operation, they do not provide enough performance advantage to justify continued operation:
- Faster access to the Provider Records, i.e., reduced latency.
- As observed by the hydra dialdown study by ProbeLab, p95 traffic impact is ~12% slower in terms of TTFB.
- Availability of Provider Records, given that Hydras are generally stable and online, whereas normal DHT nodes churn (i.e., if all 20 peers where the Provider Record is stored go offline, the content becomes unreachable).
- According to another one of ProbeLab’s studies on the Liveness of Provider Records in the IPFS network, it turns out that provider records stay available in the network regardless of Hydra’s presence for longer than the “reprovide interval” (see Section 4.3 of the relevant report).
- A “bridge” to services such as InterPlanetary Network Indexers (IPNIs) like cid.contact which advertise content from large providers like web3.storage and Filecoin nodes.
- The necessity for this bridge has been mitigated now that IPFS implementations like Kubo have defaulted to querying cid.contact in parallel to the DHT since 0.18 [1]. Popular HTTP gateways like ipfs.io/dweb.link have enabled this dual querying of the DHT and cid.contact.
Timeline
On March 18, 2023, we encountered a problem where replacement Hydra nodes could not start due to a Docker credential issue. This resulted in a steady decrease in the total number of Hydra nodes available on our network.
By April 20, 2023, there were no Hydra nodes running, causing an impact on the availability of certain content on the network. Due to low impact on network traffic, this went undetected until May 2. We understand the inconvenience this might have caused our users, and we sincerely apologize for it.
Today, we are pleased to announce that we have implemented solutions to negate the impact and by the time you read this, the patch will be tested and deployed. In particular, and as noted earlier, both kubo-v0.18 and later releases, as well as popular gateways such as ipfs.io and dweb.link have been configured (as of 2023-05-04) to query the IPFS DHT and cid.contact in parallel so that bridging happens transparently.
Why wasn’t this recognized or communicated in advance?
To be unaware was a miss, but shutting off the hydras was always an intended work item for 2023. This accelerated the process. A few things we will consider upon retrospective analysis:
- Gap in our monitoring: We didn’t have alarms configured for decline in Hydra node availability. This oversight has led us to reevaluate and plan to enhance our monitoring tools to ensure that we can identify and respond to any issues in the future.
- Metrics not affected initially: The key metrics we typically monitor were not immediately impacted by the Docker credential issue and the subsequent Hydra node downtime. As a result, our team was not fully aware of the issue’s severity until later. We are now reviewing and updating our metrics and monitoring processes to capture and respond to all relevant indicators of network performance more effectively.
What has been the impact of the Hydras being down?
The primary effects were around certain content availability. Content that was only known by cid.contact and attempted to be fetched by clients that didn’t query cid.contact were impacted. For example, this applies to Kubo version before 0.18 and even certain Kubo configurations before 0.19.2 (see footnote 1). The most notable example of this was the ipfs.io/dweb.link HTTP gateway, which was affected until the issue was resolved on May 4, 2023, when the gateways were configured to query cid.contact in parallel to the IPFS DHT.
Where should I go if I have questions or am observing unexpected behavior?
Feel free to:
- Comment on this post.
- Show up to the Content Routing Working Group.
- Engage in the
#content-routing-wg
or the#hydra-dial-down
channels on FIL Slack, both of which are actively monitored by PL team members.
Footnotes
[1] Kubo nodes with Accelerated DHT Client: Kubo nodes that enabled the Accelerated DHT Client feature were not querying cid.contact, even in versions before 0.19.1. This issue has been fixed in versions 0.19.2 and 0.20.0. The ipfs.io/dweb.link HTTP gateways were among the affected services.