IPFS Metrics & KPIs

yiannis · March 16, 2023, 8:02pm

Here’s an update on the set of KPIs we’re leaning towards, after discussing with several people and teams. Feel free to add or contribute further to the discussion.

Timeline

Decide on final KPIs on 24th March (1 week from today)
Start reporting the majority of those by the first week of April
Present and get further feedback at IPFS Thing 2023 (15th-19th April)
Revise and finalise in May.

KPIs we’re leaning towards

Network Size & Stability

Overall number of unique peers seen in the network (currently by bootstrapper + preload nodes)
- Unique number of DHT Server nodes
  - PeerIDs and Unique IP Addresses seen
- Unique number of DHT Client nodes
- Stability of DHT Server nodes
  - Classification:
    - Online: > 80% of time seen online.
    - Mostly Online: 40% < x < 80% of time seen online
    - Mostly Offline: 10% < x < 40% of time seen online
    - Offline: < 10% of time seen online.

Performance

DHT Latency - random content
- Publication Latency: Time to PUT/Provide
- Lookup Latency: Time to First Provider Record
Fetch Latency
- Time to First Byte (TTFB)
- Time to Last Byte (TTLB)
e2e Latency:
- Sum of DHT Latency + Fetch Latency
e2e Error Rate
- Percentage of requests for which delivery of content failed
- Report on the “leg” of the process that things failed, e.g., Provider Record Discovery failure vs content fetch failure
Website Load Latency
- Latency to load sample websites through browser (using PL websites for now)
Some or all of the above needs to be executed for both long-running nodes, but also short-running nodes

Traffic

Number of requests to the public, PL-operated IPFS Gateways
Number of requests to specific vantage points we control through Bitswap and through the DHT.
Number of provider records published to the network
Number of unique CIDs seen through the Gateways

Abnormalities (could also be Health)

Examples include:

Increased number of rotating PeerIDs
Unusually high *-latency
Increased number of unresponsive nodes, or errors (as reported at: https://github.com/protocol/network-measurements/tree/master/reports/2023/calendar-week-10/ipfs#errors)
Increased number of nodes from some geographic location, or from one ASN/ISP.

We should define alert levels for each one of these. For instance:

Good Health. No abnormalities.
Functioning Normally. E.g., in case of increased number of Rotated PeerIDs
Concerning Signs/Investigation Needed. E.g., increased number of nodes from a particular geographic location
Red Alert: Increased number of unresponsive nodes. Performance disrupted.

Developer Activity

[More to be added by @lidel and @guseggert]

Github Activity in IPFS related Github Orgs
Github Activity in ipfs/specs

IMPORTANT NOTE

These are the high level KPIs, primarily targeting one component of the IPFS system, i.e., the DHT. If you’re interested in lower-level metrics for your application or implementation project, please bring them up. We do gather and will be reporting lots of lower-level, protocol-specific metrics. The current set of metrics we’re looking at can be found at our weekly reports: network-measurements/reports/2023 at master · protocol/network-measurements · GitHub.
Furthermore, we’ll be developing “project-wide KPIs” to get a broader view of the network - this will come as a separate post later in the year.

Topic		Replies	Views
🌌 IPFS Measurement Report - 2023 Week 13 - 2023-04-03 Measurements	2	242	April 3, 2023
🌌 IPFS Measurement Report - 2023 Week 22 - 2023-05-29 Testing & Experiments	1	255	June 6, 2023
🌌 IPFS Measurement Report - 2023 Week 11 - 2023-03-13 Measurements	2	240	March 21, 2023
🌌 IPFS Measurement Report - 2023 Week 24 - 2023-06-12 Testing & Experiments	1	244	June 19, 2023
Measurement-based research Paper: "Mapping the Interplantery Filesystem" Ecosystem and Usage	0	422	February 19, 2020