Troubleshooting Help - Unable to add a peer to existing cluster

I started a private ipfs network with three cloud servers and it worked as intended. I tried to cluster them, but only two see each other. I then set up a fourth server and successfully got it to join the private IPFS network, but it also cannot join the cluster.

I’m at a loss on the issue and looking to see if anyone can assist. After installing ipfs-cluster-service and ipfs-cluster-ctl:

  1. I’ve added the cluster secret from Node0 to /.ipfs-cluster/.bashrc and to .ipfs-cluster/service.json in the new peer
  2. I pulled the ip from Node0 with hostname -I
  3. I pulled the cluster id from Node0 from ipfs-cluster/identity.json
  4. I then bootstrapped new peer with ipfs-cluster-service daemon –bootstrap /ip4/[ip from step 2] /tcp/9096/ipfs/[id from step 3]

I had also opened port 9096 on the new peer.

I’m not sure why the first peer worked fine to cluster, but no other subsequent peer will cluster and any guidance or tips would be appreciated.

Please post the output of:

$ ipfs-cluster-service --loglevel cluster:debug,service:debug,pstoremgr:debug daemon --bootstrap <multiaddress_of_your_other_node>

Run that in the peer that does connect, the peer that won’t connect, or the bootstrap node? Thank you

The peer that does not connect.

2021-10-11T13:26:34.563Z        INFO    service ipfs-cluster-service/daemon.go:46       Initializing. For verbose output run with "-l debug". Please wait...
2021-10-11T13:26:34.566Z        DEBUG   service ipfs-cluster-service/lock.go:36 checking lock
2021-10-11T13:26:34.566Z        DEBUG   service ipfs-cluster-service/lock.go:54 ipfs-cluster-service execution lock acquired
2021-10-11T13:26:34.576Z        INFO    badger  badger@v1.6.2/logger.go:46      All 3 tables opened in 1ms

2021-10-11T13:26:34.578Z        INFO    badger  badger@v1.6.2/logger.go:46      Replaying file id: 0 at offset: 1918

2021-10-11T13:26:34.578Z        INFO    badger  badger@v1.6.2/logger.go:46      Replay took: 2.778µs

2021-10-11T13:26:34.579Z        INFO    service ipfs-cluster-service/daemon.go:233      Datastore backend: badger
2021-10-11T13:26:34.592Z        DEBUG   cluster ipfs-cluster@v0.14.1/clusterhost.go:146 enabling DHT record persistence to datastore
2021-10-11T13:26:34.594Z        DEBUG   service ipfs-cluster-service/daemon.go:126      Configuration:
{
  "cluster": {
    "id": "",
    "peername": "Cluster-India",
    "private_key": "XXX_hidden_XXX",
    "secret": "XXX_hidden_XXX",
    "leave_on_shutdown": false,
    "listen_multiaddress": [
      "/ip4/0.0.0.0/tcp/9096",
      "/ip4/0.0.0.0/udp/9096/quic"
    ],
    "enable_relay_hop": true,
    "connection_manager": {
      "high_water": 400,
      "low_water": 100,
      "grace_period": "2m0s"
    },
    "dial_peer_timeout": "3s",
    "state_sync_interval": "5m0s",
    "pin_recover_interval": "12m0s",
    "replication_factor_min": -1,
    "replication_factor_max": -1,
    "monitor_ping_interval": "15s",
    "peer_watch_interval": "5s",
    "mdns_interval": "10s",
    "disable_repinning": true,
    "follower_mode": false,
    "peerstore_file": "",
    "peer_addresses": []
  },
  "consensus": {
    "crdt": {
      "cluster_name": "ipfs-cluster",
      "trusted_peers": [
        "*"
      ],
      "batching": {
        "max_batch_size": 0,
        "max_batch_age": "0s"
      },
      "rebroadcast_interval": "",
      "peerset_metric": "",
      "datastore_namespace": ""
    },
    "raft": {
      "data_folder": "",
      "init_peerset": [],
      "wait_for_leader_timeout": "15s",
      "network_timeout": "10s",
      "commit_retries": 1,
      "commit_retry_delay": "200ms",
      "backups_rotate": 6,
      "datastore_namespace": "",
      "heartbeat_timeout": "1s",
      "election_timeout": "1s",
      "commit_timeout": "50ms",
      "max_append_entries": 64,
      "trailing_logs": 10240,
      "snapshot_interval": "2m0s",
      "snapshot_threshold": 8192,
      "leader_lease_timeout": "500ms"
    }
  },
  "api": {
    "ipfsproxy": {
      "listen_multiaddress": "/ip4/127.0.0.1/tcp/9095",
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "node_https": false,
      "log_file": "",
      "read_timeout": "0s",
      "read_header_timeout": "5s",
      "write_timeout": "0s",
      "idle_timeout": "1m0s",
      "max_header_bytes": 4096,
      "extract_headers_extra": null,
      "extract_headers_path": "",
      "extract_headers_ttl": ""
    },
    "restapi": {
      "http_listen_multiaddress": "/ip4/127.0.0.1/tcp/9094",
      "ssl_cert_file": "",
      "ssl_key_file": "",
      "read_timeout": "0s",
      "read_header_timeout": "5s",
      "write_timeout": "0s",
      "idle_timeout": "2m0s",
      "max_header_bytes": 4096,
      "libp2p_listen_multiaddress": null,
      "id": "",
      "private_key": "XXX_hidden_XXX",
      "basic_auth_credentials": "XXX_hidden_XXX",
      "http_log_file": "",
      "headers": {},
      "cors_allowed_origins": [
        "*"
      ],
      "cors_allowed_methods": [
        "GET"
      ],
      "cors_allowed_headers": [],
      "cors_exposed_headers": [
        "Content-Type",
        "X-Stream-Output",
        "X-Chunked-Output",
        "X-Content-Length"
      ],
      "cors_allow_credentials": true,
      "cors_max_age": "0s"
    }
  },
  "ipfs_connector": {
    "ipfshttp": {
      "node_multiaddress": "/ip4/127.0.0.1/tcp/5001",
      "connect_swarms_delay": "30s",
      "ipfs_request_timeout": "5m0s",
      "pin_timeout": "2m0s",
      "unpin_timeout": "3h0m0s",
      "repogc_timeout": "24h0m0s",
      "unpin_disable": false
    }
  },
  "pin_tracker": {
    "stateless": {
      "max_pin_queue_size": 0,
      "concurrent_pins": 10
    }
  },
  "monitor": {
    "pubsubmon": {
      "check_interval": "15s",
      "failure_threshold": 3
    }
  },
  "informer": {
    "disk": {
      "metric_ttl": "30s",
      "metric_type": "freespace"
    }
  },
  "observations": {
    "metrics": {
      "enable_stats": false,
      "prometheus_endpoint": "/ip4/127.0.0.1/tcp/8888",
      "reporting_interval": "2s"
    },
    "tracing": {
      "enable_tracing": false,
      "jaeger_agent_endpoint": "/ip4/0.0.0.0/udp/6831",
      "sampling_prob": 0.3,
      "service_name": "cluster-daemon"
    }
  },
  "datastore": {
    "badger": {
      "folder": "",
      "gc_discard_ratio": 0.2,
      "gc_interval": "15m0s",
      "gc_sleep": "10s",
      "badger_options": {
        "dir": "",
        "value_dir": "",
        "sync_writes": true,
        "table_loading_mode": 2,
        "value_log_loading_mode": 0,
        "num_versions_to_keep": 1,
        "max_table_size": 16777216,
        "level_size_multiplier": 10,
        "max_levels": 7,
        "value_threshold": 32,
        "num_memtables": 5,
        "num_level_zero_tables": 5,
        "num_level_zero_tables_stall": 10,
        "level_one_size": 268435456,
        "value_log_file_size": 1073741823,
        "value_log_max_entries": 1000000,
        "num_compactors": 2,
        "compact_l_0_on_close": false,
        "read_only": false,
        "truncate": true
      }
    },
    "leveldb": {
      "folder": "",
      "leveldb_options": {
        "block_cache_capacity": 0,
        "block_cache_evict_removed": false,
        "block_restart_interval": 0,
        "block_size": 0,
        "compaction_expand_limit_factor": 0,
        "compaction_gp_overlaps_factor": 0,
        "compaction_l0_trigger": 0,
        "compaction_source_limit_factor": 0,
        "compaction_table_size": 0,
        "compaction_table_size_multiplier": 0,
        "compaction_table_size_multiplier_per_level": null,
        "compaction_total_size": 0,
        "compaction_total_size_multiplier": 0,
        "compaction_total_size_multiplier_per_level": null,
        "compression": 0,
        "disable_buffer_pool": false,
        "disable_block_cache": false,
        "disable_compaction_backoff": false,
        "disable_large_batch_transaction": false,
        "iterator_sampling_rate": 0,
        "no_sync": false,
        "no_write_merge": false,
        "open_files_cache_capacity": 0,
        "read_only": false,
        "strict": 0,
        "write_buffer": 0,
        "write_l0_pause_trigger": 0,
        "write_l0_slowdown_trigger": 0
      }
    }
  }
}

2021-10-11T13:26:34.616Z        DEBUG   service ipfs-cluster-service/daemon.go:189      stateless pintracker loaded
2021-10-11T13:26:34.617Z        INFO    cluster ipfs-cluster@v0.14.1/cluster.go:134     IPFS Cluster v0.14.1 listening on:
        /ip4/[this node]/tcp/9096/p2p/12D3KooWQ8Nzex9svhaP6aGRvddMKs4BT9BVViibmEAZnh8GvdSi
        /ip4/127.0.0.1/tcp/9096/p2p/12D3KooWQ8Nzex9svhaP6aGRvddMKs4BT9BVViibmEAZnh8GvdSi


2021-10-11T13:26:34.617Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:102      adding peer address /ip4/[Node0]/tcp/9096/p2p/12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb
2021-10-11T13:26:34.625Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:324      connecting to 12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb
2021-10-11T13:26:37.626Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:327      context deadline exceeded
2021-10-11T13:26:37.627Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:324      connecting to 12D3KooWQ8Nzex9svhaP6aGRvddMKs4BT9BVViibmEAZnh8GvdSi
2021-10-11T13:26:37.627Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:327      dial to self attempted
2021-10-11T13:26:37.627Z        DEBUG   cluster ipfs-cluster@v0.14.1/cluster.go:182     bootstrap count 0
2021-10-11T13:26:37.628Z        INFO    service ipfs-cluster-service/daemon.go:218      Bootstrapping to /ip4/[Node0]/tcp/9096/p2p/12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb
2021-10-11T13:26:37.628Z        DEBUG   cluster ipfs-cluster@v0.14.1/cluster.go:930     Join(/ip4/[Node0]/tcp/9096/p2p/12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb)
2021-10-11T13:26:37.628Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:102      adding peer address /ip4/[Node0]/tcp/9096/p2p/12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb
2021-10-11T13:26:37.628Z        INFO    restapi rest/restapi.go:549     REST API (HTTP): /ip4/127.0.0.1/tcp/9094
2021-10-11T13:26:37.628Z        INFO    ipfsproxy       ipfsproxy/ipfsproxy.go:317      IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001
2021-10-11T13:26:37.628Z        INFO    crdt    go-ds-crdt@v0.1.22/crdt.go:278  crdt Datastore created. Number of heads: 0. Current max-height: 0
2021-10-11T13:26:37.628Z        INFO    crdt    crdt/consensus.go:300   'trust all' mode enabled. Any peer in the cluster can modify the pinset.
2021-10-11T13:26:37.629Z        INFO    cluster ipfs-cluster@v0.14.1/cluster.go:656     Cluster Peers (without including ourselves):
2021-10-11T13:26:37.629Z        INFO    cluster ipfs-cluster@v0.14.1/cluster.go:658         - No other peers
2021-10-11T13:26:37.629Z        INFO    cluster ipfs-cluster@v0.14.1/cluster.go:671     ** IPFS Cluster is READY **
2021-10-11T13:26:37.629Z        DEBUG   cluster ipfs-cluster@v0.14.1/cluster.go:270     auto-triggering RecoverAllLocal()
2021-10-11T13:26:40.628Z        ERROR   p2p-gorpc       go-libp2p-gorpc@v0.1.3/call.go:63       context deadline exceeded
2021-10-11T13:26:40.628Z        ERROR   cluster ipfs-cluster@v0.14.1/cluster.go:954     context deadline exceeded
2021-10-11T13:26:40.628Z        ERROR   service ipfs-cluster-service/daemon.go:221      bootstrap to /ip4/[Node0]/tcp/9096/p2p/12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb failed: context deadline exceeded
2021-10-11T13:27:07.630Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:324      connecting to 12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb
2021-10-11T13:27:07.891Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:327      failed to dial 12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb:
  * [/ip4/[Node0]/tcp/9096] failed to negotiate security protocol: read tcp4 [this node]:9096->[Node0]:9096: read: connection reset by peer
2021-10-11T13:27:07.892Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:324      connecting to 12D3KooWQ8Nzex9svhaP6aGRvddMKs4BT9BVViibmEAZnh8GvdSi
2021-10-11T13:27:07.892Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:327      dial to self attempted
2021-10-11T13:27:37.629Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:324      connecting to 12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb
2021-10-11T13:27:37.891Z        DEBUG   pstoremgr       pstoremgr/pstoremgr.go:327      failed to dial 12D3KooWRVXuczRAT4LCdtWK6GP1f2dVBBxSXQtxs3TVSc9YbdVb:
  * [/ip4/[Node0]/tcp/9096] failed to negotiate security protocol: message did not have trailing newline

…and then it continues to “dial to self attempted”

Looks to me that there is a connectivity issue to Node0. You should make sure that your machines have connectivity and that you are using the right IP. And double-check that the cluster secrets are the same.