Ipfs-cluster connection problem

I deployed three servers in the LAN to test the Ipfs-cluster service. Everything is fine. But when I deployed three servers (A, B, C) on Amazon, A is the default node, and when B connects to A, the connection is down. The error is:

.......
12:06:45.153  INFO    service: Bootstrapping to /ip4/xxx.xxx.xxx.xxx/tcp/9096/ipfs/QmSNij94E6hFM6Y48LDna94PL3YYkr5ZirgTFPkXJTnsPc asm_amd64.s:2361
12:06:45.154  INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 asm_amd64.s:2361
12:07:05.153 ERROR    cluster: ***** ipfs-cluster consensus start timed out (tips below) ***** cluster.go:133
12:07:05.154 ERROR    cluster:
**************************************************
This peer was not able to become part of the cluster.
This might be due to one or several causes:
  - Check the logs above this message for errors
  - Check that there is connectivity to the "peers" multiaddresses
  - Check that all cluster peers are using the same "secret"
  - Check that this peer is reachable on its "listen_multiaddress" by all peers
  - Check that the current cluster is healthy (has a leader). Otherwise make
    sure to start enough peers so that a leader election can happen.
  - Check that the peer(s) you are trying to connect to is running the
    same version of IPFS-cluster.
**************************************************
 cluster.go:133
12:07:05.154  INFO    cluster: shutting down Cluster cluster.go:369
12:07:05.154  INFO  consensus: stopping Consensus component cluster.go:440
12:07:05.154 ERROR       raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:71
12:07:05.154 ERROR       raft: Failed to take snapshot: nothing new to snapshot logging.go:48
12:07:05.154  INFO    monitor: stopping Monitor cluster.go:461
12:07:05.154  INFO    restapi: stopping Cluster API cluster.go:466
12:07:05.154  INFO   ipfshttp: stopping IPFS Proxy cluster.go:470
12:07:05.155  INFO pintracker: stopping MapPinTracker cluster.go:475

Comment:
xxx.xxx.xxx.xxx is the public IP address of A

Later, I found that only the local address and local area network of I’s Ipfs-cluster service. Address without external network ip address,as follows

ipfs-cluster-service daemon
11:57:40.518  INFO    service: Initializing. For verbose output run with "-l debug". Please wait... app.go:485
11:57:40.522  INFO    cluster: IPFS Cluster v0.4.0 listening on:
        /ip4/127.0.0.1/tcp/9096/ipfs/QmSNij94E6hFM6Y48LDna94PL3YYkr5ZirgTFPkXJTnsPc
        /ip4/172.31.26.154/tcp/9096/ipfs/QmSNij94E6hFM6Y48LDna94PL3YYkr5ZirgTFPkXJTnsPc

 daemon.go:134
11:57:40.523  INFO    restapi: REST API (HTTP): /ip4/127.0.0.1/tcp/9094 asm_amd64.s:2361
11:57:40.523  INFO    restapi: REST API (libp2p-http): ENABLED. Listening on:
        /ip4/127.0.0.1/tcp/9096/ipfs/QmSNij94E6hFM6Y48LDna94PL3YYkr5ZirgTFPkXJTnsPc
        /ip4/172.31.26.154/tcp/9096/ipfs/QmSNij94E6hFM6Y48LDna94PL3YYkr5ZirgTFPkXJTnsPc

 asm_amd64.s:2361
11:57:40.524  INFO  consensus: existing Raft state found! raft.InitPeerset will be ignored consensus.go:152
11:57:40.524  INFO   ipfshttp: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 asm_amd64.s:2361
11:57:42.524  INFO  consensus: Current Raft Leader: QmSNij94E6hFM6Y48LDna94PL3YYkr5ZirgTFPkXJTnsPc consensus.go:120
11:57:42.524  INFO    cluster: Cluster Peers (without including ourselves): cluster.go:133
11:57:42.524  INFO    cluster:     - No other peers cluster.go:133
11:57:42.524  INFO    cluster: ** IPFS Cluster is READY ** cluster.go:133
11:57:47.525  INFO    cluster: peerset change detected. Saving peers addresses asm_amd64.s:2361

I think the service is not exposed to the public network, but I don’t know what to do. If it is not for this reason, I hope to point out my problem. Thank you!

The ports of 9094, 9095, and 9096 of all nodes are released. The remote telnets can be connected. They work normally, but the ipfs-cluster connection times out.

What about port 4001 (or whichever port the IPFS daemon is listening on if you’ve changed the default)?

There is no problem now, maybe a firewall was previously set up between the nodes. But there are still two problems.
There are three nodes A, B, and C. When I tested ipfs before, B masked the IP address of C, and C blocked the Ip address of B. That is, B and C are only connected to A, and cannot be connected to each other. A connects two nodes B and C at the same time as a bridge. At this time, if B uploads the file x.txt and C downloads the file x.txt, C will wait for it. This is the first question. A does not forward the request of C, and provides a download relay for C. Why not transfer through A? The second problem is that based on the previous test environment, the Ipfs-cluster service was deployed, and then the problem mentioned in the current topic appeared. The B connection A timed out. Just after you reminded me, I removed the B and C. Firewall, then the Ipfs-cluster service between B and A is back to normal, I don’t know why? Why is the firewall between B and C affecting the connection between B and A? My firewall rules block all requests for a single Ip.

Hello,

it is very hard to read all the information you provide on a single paragraph (in your last message).

All in all, you need to make sure that A, B and C can all connect to each other on the cluster listen address (TCP/9096).

Peer A is not going to bridge and forward requests from B to C automatically. This is not how cluster works. In order to work well, every peer needs connectivity to the others. In some cases, you might get NAT hole punching, but I cannot assure that it works 100% of cases (particularly on Amazon).

We have a ipfs-cluster-ctl health graph command that outputs a .dot graph file and might help you see where connections among your peers are missing. Let us know how it goes!

Thank you!
I probably know what you mean. Nodes must listen to each other for connections. I did the experiment to test whether the node forwarded the request and finally returned successfully. Because there are many network nodes (such as 10000000), all the nodes cannot be connected to each other, or the connection between the two nodes is blocked by the firewall (such as China’s GFW). Then you can’t get the data, even if you know that the node has the data you need.

Hey @jiejuezhisi, IPFS Cluster does not scale to 10000000 cluster peers. As it is now (using Raft for the consensus layer), it works well with 10.

We are working on providing alternative consensus layer that will allow scaling to hundreds (or thousands) of cluster peers. We call it the “collaborative pinsets” feature, and it’ s in our Mid term section of the roadmap.

1 Like

Hello, I would like to know the specific details of how to solve the problem, I also encountered the same problem

You need to open the relevant ports: Security and ports - Pinset orchestration for IPFS

As a side note, we shipped collaborative clusters and alternative consensus layer (crdt) long ago.