Anneka
December 20, 2018, 9:38am
1
Hi there,
I managed to set up an IPFS Cluster with three peers which are seeing each other and able to replicate and pin files.
Yesterday an error occurred on the first node and I couldn’t find helpful information on the internet, so I’m posting it here:
I have three Amazon Linux 2 Nodes which are running ipfs version 0.4.19-dev and ipfs-cluster-ctl version 0.7.0. They form a cluster and are able to see each other via bootstrap.
ipfs-cluster-ctl peers ls
QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC | ip-172-16-0-22.eu-central-1.compute.internal | Sees 2 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/9096/ipfs/QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC
- /ip4/172.16.0.22/tcp/9096/ipfs/QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC
- /p2p-circuit/ipfs/QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC
> IPFS: QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
- /ip4/127.0.0.1/tcp/4001/ipfs/QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
- /ip4/3.122.101.162/tcp/4001/ipfs/QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
- /ip6/::1/tcp/4001/ipfs/QmdsFtVEhuTYcQ5f75TDqHmN2Wk8Q2rTegUWd9uLMu3M36
QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db | ip-172-16-0-83.eu-central-1.compute.internal | Sees 2 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
- /ip4/172.16.0.83/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
- /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
> IPFS: QmdDNPUp4B8Y2B4xrAF6K6UEKbaAscgcAgpDj8yegZBpN3
- /ip4/127.0.0.1/tcp/4001/ipfs/QmdDNPUp4B8Y2B4xrAF6K6UEKbaAscgcAgpDj8yegZBpN3
- /ip6/::1/tcp/4001/ipfs/QmdDNPUp4B8Y2B4xrAF6K6UEKbaAscgcAgpDj8yegZBpN3
Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt | ip-172-16-0-17.eu-central-1.compute.internal | Sees 2 other peers
> Addresses:
- /ip4/127.0.0.1/tcp/9096/ipfs/Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt
- /ip4/172.16.0.17/tcp/9096/ipfs/Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt
- /p2p-circuit/ipfs/Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt
> IPFS: QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
- /ip4/127.0.0.1/tcp/4001/ipfs/QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
- /ip4/3.122.104.101/tcp/4001/ipfs/QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
- /ip6/::1/tcp/4001/ipfs/QmXpV7QPJvejXtRnY9F8aprYcfxApp2n1w6TszBb6C2rK5
The files that I uploaded are replicated and pinned among all nodes.
At first everything worked fine as I expected: I was able to add files from all three nodes and cat them from the other two nodes. But then I got an error on the first node saying that the merkledag could not be found.
What I don’t understand is that if I run pin ls on the first node it seems to work just fine, since all the pins are listed correctly. But when I pick a multihash from the list of pinned files it throws the described error.
ipfs-cluster-ctl pin ls
QmfCrmuqpsmcuje7qZtuMctBASGt5iN8p5L5fahNAUJh7S | | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t | | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH | | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
QmYxBJjYvb2LmcfbZMLNegD9WJxf8EMpdaKhxTj1iW21ym | | PIN | Repl. Factor: -1 | Allocations: [everywhere] | Recursive
ipfs cat QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t
Error: merkledag: not found
But this problem only occurred on the first node. The other two nodes are still able to cat the pinned files.
The log of the first node might also be helpful. It seems that the ipfs-cluster daemon is shut down by systemd, but when I check it with systemctl status ipfs-cluster, the service is still active. Stopping and restarting the service didn’t change the error.
ipfs-cluster-service[4021]: 14:06:24.374 INFO cluster: ** IPFS Cluster is READY ** cluster.go:420
systemd[1]: Stopping ipfs-cluster-service daemon...
INFO cluster: shutting down Cluster cluster.go:439
INFO consensus: stopping Consensus component consensus.go:176
ERROR raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:105
ERROR raft: Failed to take snapshot: nothing new to snapshot logging.go:105
INFO monitor: stopping Monitor pubsubmon.go:155
INFO restapi: stopping Cluster API restapi.go:449
INFO ipfsproxy: stopping IPFS Proxy ipfsproxy.go:180
INFO ipfshttp: stopping IPFS Connector ipfshttp.go:184
INFO pintracker: stopping MapPinTracker maptracker.go:119
systemd[1]: Stopped ipfs-cluster-service daemon.
-- Reboot --
Can somebody please help me?
hector
December 20, 2018, 12:24pm
2
Error: merkledag: not found
means your ipfs daemon is not running and the hash is not available locally (I think). ipfs-cluster-clt status
should give information on whether things are actually pinned in their destinations or an error happened. Also, sometimes it is necessary to run ipfs
commands with the same user that ipfs is running, so that it can see the ipfs repository and configuration correctly (sudo -u ipfs -i ipfs cat ...
assuming ipfs is running with the ipfs
user).
I’m not sure about the peer being shut down by systemd. If it’s running then it might just be an older log entry (-- Reboot --
means it was prior to a reboot?`).
Anneka
December 20, 2018, 1:40pm
3
Thanks for your fast answer!
I ran the status command and there was no error shown:
ipfs-cluster-ctl status
QmYxBJjYvb2LmcfbZMLNegD9WJxf8EMpdaKhxTj1iW21ym :
> ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-18T12:15:08Z
> ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
> ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-18T12:15:08Z
QmfCrmuqpsmcuje7qZtuMctBASGt5iN8p5L5fahNAUJh7S :
> ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-19T10:08:49Z
> ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
> ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-19T10:08:49Z
QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t :
> ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-19T12:28:33Z
> ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
> ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-19T12:28:33Z
QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH :
> ip-172-16-0-22.eu-central-1.compute.internal : PINNED | 2018-12-19T13:15:08Z
> ip-172-16-0-83.eu-central-1.compute.internal : PINNED | 2018-12-19T14:08:54Z
> ip-172-16-0-17.eu-central-1.compute.internal : PINNED | 2018-12-19T13:15:08Z
I also ran the cat command with the ec2-user on which my ipfs is running, but got the same error:
sudo -u ec2-user -i ipfs cat QmPQb7rQcC4cim3YQKgsAqrvxbfwShF78WEmXSohmBhR4t
Error: merkledag: not found
Here the latest logs that I got with journalctl. If I understand it correctly, he reboots the ipfs-cluster daemon.
Dez 19 13:29:39 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 13:29:39.320 INFO cluster: ** IPFS Cluster is READY ** cluster.go:420
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopping ipfs-cluster-service daemon...
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO cluster: shutting down Cluster cluster.go:439
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO consensus: stopping Consensus component consensus.go:176
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 ERROR raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:105
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 ERROR raft: Failed to take snapshot: nothing new to snapshot logging.go:105
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO monitor: stopping Monitor pubsubmon.go:155
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO restapi: stopping Cluster API restapi.go:449
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO ipfsproxy: stopping IPFS Proxy ipfsproxy.go:180
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO ipfshttp: stopping IPFS Connector ipfshttp.go:184
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.047 INFO pintracker: stopping MapPinTracker maptracker.go:119
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[3853]: 14:05:41.048 ERROR libp2p-raf: Failed to decode incoming command: stream reset transport.go:37
Dez 19 14:05:41 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopped ipfs-cluster-service daemon.
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Started ipfs-cluster-service daemon.
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Starting ipfs-cluster-service daemon...
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.865 INFO service: Initializing. For verbose output run with "-l debug". Please wait... daemon.go:44
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.871 INFO cluster: IPFS Cluster v0.7.0+gitfdfe8def9467893d451e1fcb8ea3fb980c8c1389 listening on:
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/127.0.0.1/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/172.16.0.83/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: cluster.go:107
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872 INFO restapi: REST API (HTTP): /ip4/127.0.0.1/tcp/9094 restapi.go:414
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872 INFO restapi: REST API (libp2p-http): ENABLED. Listening on:
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/127.0.0.1/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: /ip4/172.16.0.83/tcp/9096/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: restapi.go:431
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872 INFO ipfsproxy: IPFS Proxy: /ip4/127.0.0.1/tcp/9095 -> /ip4/127.0.0.1/tcp/5001 ipfsproxy.go:205
Dez 19 14:06:19 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:19.872 INFO consensus: existing Raft state found! raft.InitPeerset will be ignored raft.go:203
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.372 INFO consensus: Current Raft Leader: QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db raft.go:293
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374 INFO cluster: Cluster Peers (without including ourselves): cluster.go:405
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374 INFO cluster: - QmRCbZCyhFjNxPwzyXqUSj8FU5P121uVyQaY6QfkBKeNjC cluster.go:412
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374 INFO cluster: - Qmaf5QBT34GCYDHRsukZznwhTQ8J9hfuxDB9sqtPwHwryt cluster.go:412
Dez 19 14:06:24 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:06:24.374 INFO cluster: ** IPFS Cluster is READY ** cluster.go:420
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopping ipfs-cluster-service daemon...
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO cluster: shutting down Cluster cluster.go:439
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO consensus: stopping Consensus component consensus.go:176
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 ERROR raft: NOTICE: Some RAFT log messages repeat and will only be logged once logging.go:105
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 ERROR raft: Failed to take snapshot: nothing new to snapshot logging.go:105
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO monitor: stopping Monitor pubsubmon.go:155
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO restapi: stopping Cluster API restapi.go:449
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO ipfsproxy: stopping IPFS Proxy ipfsproxy.go:180
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO ipfshttp: stopping IPFS Connector ipfshttp.go:184
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[4021]: 14:07:06.231 INFO pintracker: stopping MapPinTracker maptracker.go:119
Dez 19 14:07:06 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Stopped ipfs-cluster-service daemon.
-- Reboot --
Dez 19 14:08:47 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Started ipfs-cluster-service daemon.
Dez 19 14:08:47 ip-172-16-0-83.eu-central-1.compute.internal systemd[1]: Starting ipfs-cluster-service daemon...
Dez 19 14:08:48 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[2656]: 14:08:48.306 INFO service: Initializing. For verbose output run with "-l debug". Please wait... daemon.go:44
Dez 19 14:08:48 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[2656]: 14:08:48.331 INFO cluster: IPFS Cluster v0.7.0+gitfdfe8def9467893d451e1fcb8ea3fb980c8c1389 listening on:
Dez 19 14:08:48 ip-172-16-0-83.eu-central-1.compute.internal ipfs-cluster-service[2656]: /p2p-circuit/ipfs/QmZ1Mv5C85b5KY7Lq3LoKPydnQtm93bHMyLuoRaE5DC3db
May there be another cause for my problem?
hector
December 20, 2018, 2:09pm
4
Regarding the restart logs, does it do it all the time? (They are from yesterday, one seems a system reboot)
Regarding merkledag: not found
…hmm I’m not sure. does ipfs pin ls --type=recursive
show that hash?
Anneka
December 20, 2018, 2:33pm
5
No, the hash is not appearing in the list when I execute ipfs pin ls --type=recursive
ipfs pin ls --type=recursive
QmYxBJjYvb2LmcfbZMLNegD9WJxf8EMpdaKhxTj1iW21ym recursive
QmauaRtCNg9kEAHCoXH1Bcd25BbqBMGY6FpruVhks5ycNX recursive
QmS4ustL54uo8FzR9455qaxZwuMiUhyvMcX9Ba8nUH4uVv recursive
QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn recursive
Those logs are the latest that I got with the jounalctl command. No, it doesn’t do it all the time.
You are right, thats my fault. Those logs are from yesterday and I produced them myself by trying to remove the error stopping and starting the ipfs and ipfs-cluster service. I thought they might help, but ended up confusing the situation - sorry!
hector
December 21, 2018, 11:50am
6
It is contradictory that ipfs pin ls
does not show the pin and ipfs-cluster-ctl status
says it’s pinned.
merkledag not found
is consistent with the pin not being available locally and the ipfs
daemon being not running on that peer.
I suggest you double check your setup, make sure that the cluster peers are talking to the right ipfs daemons (hopefully running locally on the same box), and that the server from which you are running ipfs
is running one of those ipfs daemons.