The dataset is published as a CAR file to an IPFS cluster of 2 nodes.
The car file is 12mb size but inside there are around 20K files in a directory tree.
Currently is taking more than 30 minutes for the publishing to happen, which I think is excessive, and also is lately failing with the following error:
2022-08-22T20:02:05.008Z ERROR adder adder/util.go:58 BlockPut on 12D3KooWAggMRPC2f6Khe5FUc4tPEguBC2J95BbQWJgku7eNFWgH: Post "http://172.19.0.3:5001/api/v0/block/put?format=protobuf&mhlen=32&mhtype=sha2-256": context canceled
What is interesting about this is that the context seems to be cancelled by IPFS daemon when IPFS-CLUSTER makes requests inside a docker-compose on the same host.
Why is publishing so long for only 12mb CAR?
Why is the context being cancelled ?
@hector not nginx but haproxy is involved in doing SSL termination and reverse proxy.
Yes, I noticed the other issue, but I thought the message was complaining bout the context to the IPFS server because of the URL in the message. So you think the context that requests to the cluster is the one aborted?
The request to IPFS is aborted because of a context cancellation, and the only way that can happen is because the original request to the cluster api is cancelled (I think). Try adding with ?stream-channels=false. Some proxies do not handle well the fact that progress responses are sent before the upload is finished, but I’m not sure if that is the case here. HAproxy logs might give some insight too… what return code and/or do they show for these requests?
There is also the possibility that Block/put calls to ipfs just hang (i.e. disk-writing issue). So I would verify that the disk is healthy and performs well.
I have used this timeout to the specific backend because I fear using it in general could lead to haproxy running out of resources if any other service does not properly terminate connections.
@hector and do you have any idea why is it so slow to add this CAR? anything that can be done about it?
Yes, it seems to be very slow. What is the replication factor? Do you think one of those peers has very low bandwidth?
The car is streamed to all nodes that will pin the content, so if it takes very long it usually means one of those streams is very slow (either bandwidth is very low, or writing to disk is very slow).
One workaround is to add will ?local=true (ipfs-cluster-ctl add --local…). This will upload direclty only to the local node. The file will be replicated by pinning it in other places and retrieved via bitswap.
@hector , actually the haproxy setting is not 100% working, our service publishes daily and I am still getting failures. I am starting to think that it gets stuck and fails no matter at which timeout, no failures are happening after 60m+ even thou when there is success publishing takes around 30 minutes.
I am tempted to test to deliver without SSL termination, and see what happens.
the cluster is only 2 nodes: datacenter and lab, I don’t think we have bandwidth issues, the car is very small… if I scp the CAR file to the datacenter it takes only seconds.
/ # ipfs-cluster-ctl -v
ipfs-cluster-ctl version 0.14.5
I am aware of the local=true option, but the idea is to publish and then update dns entries knowing that the nodes are ready to receive traffic for content already pinned. It is a nice functionality.
The CAR is 50mb (did not realize that it was zipped by gitlab, which is 12mb). Once un-car’ed it is 199Mb with 25K files in 24K directories. Mainly JSON files.
If you want to browse an example: use CID bafybeihhaxdfv6lnbyln7z2nwwo4wzb5myhnrfv5afayqr3c2jobfyvrqq
This is a very old version. Back in this version, adding a file meant one block/put request for each block in it. Your files are specially block-heavy. This is probably the reason it takes forever. Upgrade to the latest cluster version and try again, see how long it takes.