Hang when trying to transfer files between nodes

I’m trying to transfer a lot of files between two nodes. Wrote up a script to use the API to list the pins between them, connect to eachother as peers, then pin add each file. Works well, but after a seemingly random amount of pin adds, the command will hang indefinetely and persistently. Restarting the script does nothing. Restarting the receiver(Node B) usually allows the process to continue.

They’re both running go-ipfs version 0.4.13 using the IPFS docker image on Ubuntu Linux 16.04 on AWS EC2 instances.

This is some brief diagnosis after the error occurs:

Node A(source):

/ # ipfs id
{
    [...]
    "Addresses": [
        [...]
        "/ip4/1.2.3.4/tcp/4001/ipfs/QmSscuC3tyLSCjWDviuff422gb1mmsp1yo1sbt6KrSwFdH"
    ],
    "AgentVersion": "go-ipfs/0.4.13/3b16b74",
    "ProtocolVersion": "ipfs/0.1.0"
}
/ # ipfs version --all
go-ipfs version: 0.4.13-3b16b74
Repo version: 6
System version: amd64/linux
Golang version: go1.9.2

Node B(destination):

/ # ipfs id
{
    [...]
    "Addresses": [
        [...]
        "/ip4/4.3.2.1/tcp/4001/ipfs/QmQENeAYCR6AL8sQUx2XodXMam44BuyAhLcJeXsdQMwToU"
    ],
    "AgentVersion": "go-ipfs/0.4.13/3b16b74",
    "ProtocolVersion": "ipfs/0.1.0"
}
/ # ipfs version --all
go-ipfs version: 0.4.13-3b16b74
Repo version: 6
System version: amd64/linux
Golang version: go1.9.2

B was then connected to A:

/ # ipfs swarm connect /ip4/1.2.3.4/tcp/4001/ipfs/QmSscuC3tyLSCjWDviuff422gb1mmsp1yo1sbt6KrSwFdH
connect QmSscuC3tyLSCjWDviuff422gb1mmsp1yo1sbt6KrSwFdH success

Confirming on B, it is in fact connected to A:

/ # ipfs swarm peers | grep 1.2.3.4
/ip4/1.2.3.4/tcp/4001/ipfs/QmSscuC3tyLSCjWDviuff422gb1mmsp1yo1sbt6KrSwFdH

Port appears open from B to A:

/ # nc 1.2.3.4 4001
/multistream/1.0.0

And from A to B:

/ # nc 4.3.2.1 4001
/multistream/1.0.0

A has the pinned file:

/ # ipfs pin ls | grep QmNLerXsk1kp98tDbnArAchg4jwwvw2cLoHMwZteV8NXc5
QmNLerXsk1kp98tDbnArAchg4jwwvw2cLoHMwZteV8NXc5 recursive

When trying to fetch the file on Node B, the command hangs indefinetely:

/ # ipfs pin add QmNLerXsk1kp98tDbnArAchg4jwwvw2cLoHMwZteV8NXc5

These are the only relevant log messages, but they don’t seem to necessarilly correlate with the start of the hangs.

00:04:24.601 ERROR    bitswap: couldnt open sender again after SendMsg(<peer.ID SscuC3>) failed: session shutdown wantmanager.go:237
00:23:07.123 ERROR    bitswap: couldnt open sender again after SendMsg(<peer.ID SscuC3>) failed: session shutdown wantmanager.go:237
00:30:29.622 ERROR    bitswap: couldnt open sender again after SendMsg(<peer.ID SscuC3>) failed: session shutdown wantmanager.go:237
01:18:30.184 ERROR    bitswap: couldnt open sender again after SendMsg(<peer.ID SscuC3>) failed: session shutdown wantmanager.go:237

I’m not really sure where to go further to troubleshoot this. Maybe there’s a way to get more debug output? Would a GitHub issue be more appropriate? Any suggestions?

Can you try with v0.4.14? I suspect you might be hitting this issue which was fixed after the release of v0.4.13.

1 Like

@leerspace It looks like you pegged that one right, thanks!

1 Like