Optimizing Private IPFS: Building a Private Indexer for Swarmkey'd Networks

Over the past few months, we’ve been working on an IPFS optimization focused on private networks.

At Neova, we encountered several limitations tied to our infrastructure needs and real-world enterprise use cases. After discussing these challenges with the IPFS Foundation, we designed and developed a new component:

Private Indexer — Technical Guide

A complete specification for the Private Bitswap Indexer: a content discovery service for private IPFS networks that replaces vanilla IPNI with an explicit, HTTP-based announce/lookup protocol.

1. Purpose

Private IPFS networks (swarm.key v1) cannot rely on the public DHT or the public IPNI (InterPlanetary Network Indexer) infrastructure for content discovery. The Private Indexer solves this by providing:

Fast content discovery — O(1) lookup of CID-to-provider mappings via a centralized HTTP registry, eliminating DHT walk latency.

Explicit announce protocol — A fully specified, HTTP+JSON announce/lookup protocol with clear request/response schemas, authentication, and batch semantics. This replaces the unmaintained announce protocol in vanilla IPNI that has historically caused interoperability failures across IPFS implementations.

Provider ranking — Weighted scoring (freshness, latency, success rate) so clients receive providers sorted by quality, not just availability.

Zero coupling to IPFS implementation — The indexer speaks plain HTTP/JSON. Any IPFS implementation (Kubo, Helia, Iroh, a custom node) can participate by implementing the announce client, which is a single POST /announce call.

2. Delta from Vanilla IPNI

This section explicitly documents what changes from the standard IPNI model and why.

2.1 Status Quo: How Vanilla IPNI Works

In the public IPFS/Filecoin ecosystem, IPNI works as follows:

Problems in private networks:

Problem Detail
Announce is underspecified The vanilla IPNI announce message format, transport selection (HTTP vs gossipsub), and polling semantics are not rigorously specified. Behavior varies across Kubo versions and forks.
Advertisement chain is complex Providers must construct IPLD DAG advertisements with entry chunks, contextual metadata, and chain links. This is heavyweight for a private network with dozens of nodes.
Public infrastructure dependency Vanilla IPNI assumes publicly reachable endpoints (cid.contact, IPNI HTTP endpoints). Private swarms behind NAT/firewall cannot use these.
No provider quality signals Vanilla IPNI returns provider records without latency, freshness ranking, or health information.
DHT fallback coupling Many IPFS implementations fall back to DHT provider records, which are disabled in private swarms.

2.2 What the Private Indexer Changes

Aspect Vanilla IPNI Private Indexer Rationale
Announce transport HTTP PUT to /ingest/announce or libp2p gossipsub HTTP POST to /announce with JSON body Single, explicit transport. No gossipsub dependency. Any HTTP client can announce.
Announce payload IPLD Advertisement DAG (linked list of entry chunks with provider/metadata) Flat JSON: {peer_id, addrs, cids, …metadata} No IPLD dependency. Schema is a single JSON object. Trivial to implement in any language.
Ingestion model Indexer pulls advertisement chain from provider Provider pushes CID sets to indexer Push model eliminates polling complexity. Announcer sidecar handles batching.
Entry format Multihash entry chunks (IPLD linked lists, each chunk up to 16384 entries) Flat CID string array in JSON body, batched by announcer No chunking protocol needed. Batch size is configurable (default 1000).
Metadata IPLD-encoded extended provider metadata (transport, piece CID, etc.) JSON fields: evm_address, provider_name, node_name, ipfs_storage_max Application-specific metadata in plain JSON, extensible without schema changes.
Lookup GET /cid/{cid} or delegated routing /routing/v1/providers/{cid} GET /providers?cid={cid} Returns provider records with scores, latency, and availability.
Provider ranking None (unordered list) Weighted composite score: freshness (0.3) + latency (0.3) + success rate (0.4) Clients get best provider first.
Record lifetime Advertisements form an append-only chain; no native TTL Explicit TTL per record (default 24h) with background GC Stale records automatically expire. Re-announce refreshes TTL.
Authentication None (public by design) Bearer token + peer ID allowlist Private network access control.
Health probing None Background TCP RTT probes to announced multiaddrs Feeds into provider scoring; detects unreachable peers.

2.3 Configuration Delta per Component

To move from a vanilla IPNI setup to the Private Indexer, each component needs:

On each Kubo node:

# Already required for private swarm:
LIBP2P_FORCE_PNET=1
IPFS_SWARM_KEY_FILE=/data/ipfs/swarm.key

# Required config changes (applied by ipfs-init.sh):
ipfs bootstrap rm --all               # No public bootstraps
ipfs config Routing.Type dhtclient     # Disable DHT server mode
ipfs config --bool Swarm.RelayClient.Enabled false
ipfs config --bool Swarm.RelayService.Enabled false

New process — Announcer sidecar (one per Kubo node):

announcer \
-kubo-api    http://localhost:5001 \
-indexer-url http://<indexer-host>:8090 \
-auth-token  “${INDEXER_AUTH_TOKEN}” \
-poll-interval 30s \
-reannounce-interval 12h \
-batch-size 1000

New process — Private Indexer (one per network):

private-indexer \
-listen      :8090 \
-ttl         24h \
-auth-token  “${INDEXER_AUTH_TOKEN}” \
-allowed-peers “QmPeer1,QmPeer2,…” \
-probe-interval 30s \
-probe-timeout  5s

3. Architecture

3.1 High-Level Architecture Diagram

→ In the first comment of this topic.

3.2 Component Roles

Component Binary Role Instances
Private Indexer private-indexer Central HTTP registry. Accepts announcements, answers provider queries, runs background probes, exposes metrics. 1 per network
Announcer announcer Sidecar process. Polls local Kubo for pinned CIDs, pushes deltas to the indexer over HTTP. Handles batching, backfill, and periodic re-announce. 1 per Kubo node
Kubo Node ipfs (Kubo v0.32.1) IPFS node running in private mode (LIBP2P_FORCE_PNET=1). Stores and serves blocks via Bitswap. Provides HTTP API for pin listing and identity. N per network
Swarm Accelerator swarm-accelerator (Planned) CDN-like gateway layer with parallel fetch and caching. Skeleton only in current codebase.

3.3 Internal Data Model

MemoryStore
─ providers: map[CID] → map[PeerID] → providerEntry
providerEntry { PeerID, Addrs, LastSeen, ExpiresAt }

─ peers: map[PeerID] → peerState
peerState {
PeerID, Addrs, FirstSeen, LastSeen,
CIDCount, AgentVersion,
EVMAddress, ProviderName, NodeName, IPFSStorageMax,
successCount, failureCount,
latencies [max 10 samples], lastProbe,
announceCount
}

─ TTL:  configurable (default 24h)
Records with ExpiresAt < now are filtered from queries

─ GC:   background goroutine sweeps expired entries every gcInterval (default 5m)

After reading this, if you are facing same limitations regarding swarmkey’d networks, please reach us to go deeper in all cases.