I have 4 peers running ipfs-cluster available on Internet. I would like to be able to add other peers by bootstrapping them to the leader. Those peers are going to stand behind a firewall (I can not tweak anything), for sure inbound port will be filtered, outbound should be opened. I can not guarantee that the multiaddress of those peers would be accessible from Internet.
Currently, the cluster is able so see the peer asking for connection, but I get quickly those message :
Failed to heartbeat to Qmc****: dial backoff logging.go:105 and the service shutdown on the peer
Is there any configuration on this peer to make ipfs-cluster working ?
Can you post the logs of the Firewalled peer?
Are you bootstrapping to the same peer that shows the
Failed to heartbeat messages? If not, can you try that?
If NAT hole-punching works, and you bootstrap directly to the cluster Raft leader, I think that the firewalled peer should then manage to punch holes to the rest of peers. But if bootstrapping to someone that is not the leader, that makes the leader not be able to heartbeat the firewalled peer.
If you are indeed already bootstrapping to the leader, then libp2p’s NAT hole-punching may not work in your environment. We might explore other options like QUIC or libp2p circuits then, but I haven’t tried them myself.
Are you bootstrapping to the same peer that shows the Failed to heartbeat messages? If not, can you try that?
I tried and here is the log
Can you post the logs of the Firewalled peer?
17:38:27.707 INFO consensus: Current Raft Leader: QmcWtMUskZ7cANVnrABm7sBdEpmyv8iPsxST3YwMpTURSw raft.go:293
17:38:27.708 INFO cluster: QmcnEty1HpQzzXq2X4tXPh2foFAVH152wQPhwdNvkceh3g: joined QmcWtMUskZ7cANVnrABm7sBdEpmyv8iPsxST3YwMpTURSw's cluster cluster.go:692
17:38:47.800 ERROR cluster: no state has been agreed upon yet cluster.go:856
17:38:59.868 ERROR p2p-gorpc: dial attempt failed: <peer.ID Qm*kceh3g> --> <peer.ID Qm*bvJHsz> dial attempt failed: context deadline exceeded call.go:63
17:39:00.082 ERROR p2p-gorpc: dial attempt failed: <peer.ID Qm*kceh3g> --> <peer.ID Qm*gXD7iQ> dial attempt failed: context deadline exceeded call.go:63
17:39:00.234 ERROR p2p-gorpc: dial attempt failed: <peer.ID Qm*kceh3g> --> <peer.ID Qm*Gu2RHf> dial attempt failed: context deadline exceeded call.go:63
17:39:29.371 ERROR cluster: no state has been agreed upon yet cluster.go:856
Here is the log from the leader :
déc. 06 17:38:26 ipfs-amazon ipfs-cluster-service: 17:38:26.489 INFO consensus: peer added to Raft: QmcnEty1HpQzzXq2X4tXPh2foFAVH152wQPhwdNvkceh3g consensus.go:355
déc. 06 17:38:27 ipfs-amazon ipfs-cluster-service: 17:38:27.074 INFO cluster: Peer added QmcnEty1HpQzzXq2X4tXPh2foFAVH152wQPhwdNvkceh3g cluster.go:602
And from an other node :
Dec 06 17:41:03 ipfs-tutu ipfs-cluster-service: 17:41:03.467 ERROR p2p-gorpc: dial attempt failed: <peer.ID Qm*bvJHsz> --> <peer.ID Qm*kceh3g> dial attempt failed: context deadline exceeded call.go:63
Dec 06 17:41:03 ipfs-tutu ipfs-cluster-service: 17:41:03.467 ERROR cluster: <peer.ID Qm*bvJHsz>: error in broadcast response from <peer.ID Qm*kceh3g>: dial attempt failed: <peer.ID Qm*bvJHsz> --> <peer.ID Qm*kceh3g> dial attempt failed: context deadline exceeded cluster.go:1180
If I try to pin a file from a peer from the cluster I get the following message from
<peer.ID Qm*kceh3g> : CLUSTER_ERROR: dial attempt failed: <peer.ID Qm*bvJHsz> --> <peer.ID Qm*kceh3g> dial attempt failed: context deadline exceeded | 2018-12-06T16:41:03Z
It looks like the peer can not be reach from Internet
Ah, ok, but the firewalled peer is not dying on bootstrap right?
what happens if you run
ipfs-cluster-ctl peers ls from that peer? This will open connections from the peer to the rest. Can you pin afterwards?
Right, It is not dying anymore, thanks !
Indeed, peers from the cluster does not complain anymore, logs keep quiet on the firewalled peer. Do you know why I need to
peer ls before doing any things ?
When pinning a file (42MB) from one of the member of the cluster I had a timeout issue :
Dec 07 10:25:40 ipfs-tutu ipfs-cluster-service: 10:25:40.088 ERROR adder: error adding to cluster: read tcp4 127.0.0.1:9094->127.0.0.1:37032: i/o timeout adder.go:146
It looks like the file is too big to synchronise on all members of the cluster.
I tried with a smaller file (1.2MB) and it worked ! I was able to download the file from the firewalled peer gateway.
My plan is to synchronize large amount of files from a cluster to firewalled peers. I’m wondering how I’m gonna synchronize file over 1GB, is there any tweak to be made ?
Many thanks for your help !
Either this is an network issue unrelated to cluster or your peer configuration have timeouts set (see
write_timeout https://cluster.ipfs.io/documentation/configuration/#restapi - they should be set to 0).
It is a work around. This forces the firewalled peer to open connections to every other peer. Once those connections are established, they can be used to contact that peer. What operating system are you using though?
I have opened an issue here https://github.com/ipfs/ipfs-cluster/issues/614 so that we can figure out what’s the best approach to make the process painless.
Nice ! I have been able to pin a 700MB file, my settings were wrong. They were coming from an ansible playbook. I wwill make a PR about it.
Should I replicate all the default settings from : https://cluster.ipfs.io/documentation/configuration/ ?
I noticed the issue, thanks, I’ll follow it !
Yes, I think so (or better, the defaults from
ipfs-cluster-service init). Thanks for catching that. I am manually overwriting those everywhere so I didn’t notice.
Thanks, it works well now !
I wish to build ipfs-cluster topology with mixed arch (leader public amd64) and firewalled nodes (arm, arm64).
I saw your project https://github.com/hsanjuan/ansible-ipfs-cluster. But I am not yet familiar with ansible…
Preparing a workshop to teach how to build the New Internet during a resilience learning festival.
I am willing to experiment IPFS with CJDNS and write a detailed step by step guide.
I wonder if I could rely on your help?