IPFS Metrics & KPIs

Here’s an update on the set of KPIs we’re leaning towards, after discussing with several people and teams. Feel free to add or contribute further to the discussion.

Timeline

  • Decide on final KPIs on 24th March (1 week from today)
  • Start reporting the majority of those by the first week of April
  • Present and get further feedback at IPFS Thing 2023 (15th-19th April)
  • Revise and finalise in May.

KPIs we’re leaning towards

Network Size & Stability

  • Overall number of unique peers seen in the network (currently by bootstrapper + preload nodes)
    • Unique number of DHT Server nodes
      • PeerIDs and Unique IP Addresses seen
    • Unique number of DHT Client nodes
    • Stability of DHT Server nodes
      • Classification:
        • Online: > 80% of time seen online.
        • Mostly Online: 40% < x < 80% of time seen online
        • Mostly Offline: 10% < x < 40% of time seen online
        • Offline: < 10% of time seen online.

Performance

  • DHT Latency - random content
    • Publication Latency: Time to PUT/Provide
    • Lookup Latency: Time to First Provider Record
  • Fetch Latency
    • Time to First Byte (TTFB)
    • Time to Last Byte (TTLB)
  • e2e Latency:
    • Sum of DHT Latency + Fetch Latency
  • e2e Error Rate
    • Percentage of requests for which delivery of content failed
    • Report on the “leg” of the process that things failed, e.g., Provider Record Discovery failure vs content fetch failure
  • Website Load Latency
    • Latency to load sample websites through browser (using PL websites for now)
  • Some or all of the above needs to be executed for both long-running nodes, but also short-running nodes

Traffic

  • Number of requests to the public, PL-operated IPFS Gateways
  • Number of requests to specific vantage points we control through Bitswap and through the DHT.
  • Number of provider records published to the network
  • Number of unique CIDs seen through the Gateways

Abnormalities (could also be Health)

Examples include:

We should define alert levels for each one of these. For instance:

  • :green_circle: Good Health. No abnormalities.
  • :yellow_circle: Functioning Normally. E.g., in case of increased number of Rotated PeerIDs
  • :orange_circle: Concerning Signs/Investigation Needed. E.g., increased number of nodes from a particular geographic location
  • :red_circle: Red Alert: Increased number of unresponsive nodes. Performance disrupted.

Developer Activity

[More to be added by @lidel and @guseggert]

  • Github Activity in IPFS related Github Orgs
  • Github Activity in ipfs/specs

IMPORTANT NOTE

These are the high level KPIs, primarily targeting one component of the IPFS system, i.e., the DHT. If you’re interested in lower-level metrics for your application or implementation project, please bring them up. We do gather and will be reporting lots of lower-level, protocol-specific metrics. The current set of metrics we’re looking at can be found at our weekly reports: network-measurements/reports/2023 at master · protocol/network-measurements · GitHub.
Furthermore, we’ll be developing “project-wide KPIs” to get a broader view of the network - this will come as a separate post later in the year.