IPFS cluster/scalability/observability deep dive with @hsanjuan. Some topics:
- What should we be doing to ensure our IPFS node(s) don’t fall over from load, either via direct requests or pubsub?
- How can we detect build up of problems before they become critical?
- How should we respond if it does start to have issues handling the load, etc.?
Also:
- The cluster swarm
- The configuration file
- Cluster peer start
- The API
- The pinning process
- The allocations
- The adding process
- The pin batching
- The CRDT DAG
- CRDT-DAG sync
- Pin Queue management
- Adding a new peer
- Skipping state sync
- Removing an existing peer
- Metrics dashboard
- Host management, e.g. growing/shrinking safely.
- APIs
- Problems, how to detect them, how to proceed:
- Too many uploads
- Faulty disk
- Corrupted badger
- How to remove from DNS
- How to prevent allocations to node
- Caveats, what NOT to do when solving issues