Context cancelled while publishing CAR to cluster

rvalle · August 23, 2022, 8:15am

I am distributing a dataset via IPFS daily.

The dataset is published as a CAR file to an IPFS cluster of 2 nodes.

The car file is 12mb size but inside there are around 20K files in a directory tree.

Currently is taking more than 30 minutes for the publishing to happen, which I think is excessive, and also is lately failing with the following error:

    2022-08-22T20:02:05.008Z	ERROR	adder	adder/util.go:58	BlockPut on 12D3KooWAggMRPC2f6Khe5FUc4tPEguBC2J95BbQWJgku7eNFWgH: Post "http://172.19.0.3:5001/api/v0/block/put?format=protobuf&mhlen=32&mhtype=sha2-256": context canceled

What is interesting about this is that the context seems to be cancelled by IPFS daemon when IPFS-CLUSTER makes requests inside a docker-compose on the same host.

Why is publishing so long for only 12mb CAR?
Why is the context being cancelled ?

Any ideas about how could I fix this?

hector · August 23, 2022, 10:38am

It may be that the request to the cluster API is aborted. How are you calling the API? Is nginx involved? Are you using the latest version of cluster?

rvalle · August 23, 2022, 10:43am

@hector not nginx but haproxy is involved in doing SSL termination and reverse proxy.

Yes, I noticed the other issue, but I thought the message was complaining bout the context to the IPFS server because of the URL in the message. So you think the context that requests to the cluster is the one aborted?

hector · August 23, 2022, 10:52am

The request to IPFS is aborted because of a context cancellation, and the only way that can happen is because the original request to the cluster api is cancelled (I think). Try adding with ?stream-channels=false. Some proxies do not handle well the fact that progress responses are sent before the upload is finished, but I’m not sure if that is the case here. HAproxy logs might give some insight too… what return code and/or do they show for these requests?

rvalle · August 23, 2022, 10:55am

Yes, I can see now 504s returned from haproxy. “Gateway timeout”.

I am going to review the documentation carefully but at first sight it looks like haproxy complains nothing is coming back from ipfs-cluster…

hector · August 23, 2022, 11:47am

There is also the possibility that Block/put calls to ipfs just hang (i.e. disk-writing issue). So I would verify that the disk is healthy and performs well.

rvalle · August 24, 2022, 7:46am

I am rising the haproxy backend timeout to… 60 minutes!

that seems to be working now. adding the 12mb CAR took 32 minutes and did not complain.

backend ipfscapi-backend
  mode http
  option forwardfor
  timeout server 60m
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto https if { ssl_fc }

I have used this timeout to the specific backend because I fear using it in general could lead to haproxy running out of resources if any other service does not properly terminate connections.

@hector and do you have any idea why is it so slow to add this CAR? anything that can be done about it?

hector · August 25, 2022, 7:58pm

Are you using the latest version of ipfs cluster?

Yes, it seems to be very slow. What is the replication factor? Do you think one of those peers has very low bandwidth?

The car is streamed to all nodes that will pin the content, so if it takes very long it usually means one of those streams is very slow (either bandwidth is very low, or writing to disk is very slow).

One workaround is to add will ?local=true (ipfs-cluster-ctl add --local…). This will upload direclty only to the local node. The file will be replicated by pinning it in other places and retrieved via bitswap.

rvalle · August 31, 2022, 1:56pm

@hector , actually the haproxy setting is not 100% working, our service publishes daily and I am still getting failures. I am starting to think that it gets stuck and fails no matter at which timeout, no failures are happening after 60m+ even thou when there is success publishing takes around 30 minutes.

I am tempted to test to deliver without SSL termination, and see what happens.

the cluster is only 2 nodes: datacenter and lab, I don’t think we have bandwidth issues, the car is very small… if I scp the CAR file to the datacenter it takes only seconds.

/ # ipfs-cluster-ctl -v
ipfs-cluster-ctl version 0.14.5

I am aware of the local=true option, but the idea is to publish and then update dns entries knowing that the nodes are ready to receive traffic for content already pinned. It is a nice functionality.

The CAR is 50mb (did not realize that it was zipped by gitlab, which is 12mb). Once un-car’ed it is 199Mb with 25K files in 24K directories. Mainly JSON files.

If you want to browse an example: use CID bafybeihhaxdfv6lnbyln7z2nwwo4wzb5myhnrfv5afayqr3c2jobfyvrqq

hector · August 31, 2022, 7:54pm

This is a very old version. Back in this version, adding a file meant one block/put request for each block in it. Your files are specially block-heavy. This is probably the reason it takes forever. Upgrade to the latest cluster version and try again, see how long it takes.

rvalle · September 1, 2022, 6:52am

@hector

Arrrghh! We pinned the versions in the past for some reason, and forgot to un-pin then.

Yes, I updated the whole cluster and publishing pipelines.

Publishing time is now… 42 seconds!

Thanks and sorry for the trouble.

Topic		Replies	Views
Error while adding large size files into IPFS-cluster IPFS Cluster go-ipfs , ipfs-cluster	1	291	June 21, 2023
IPFS cluster add file error go-ipfs	11	848	April 26, 2022
IPFS Cluster stop working when one of the node goes Off IPFS Cluster go-ipfs , ipfs-cluster	17	2712	October 16, 2018
Publishing data to an ipfs-cluster	4	349	May 7, 2021
I'm not able to publish file to local cluster Help ipfs-cluster	2	611	April 8, 2022

Context cancelled while publishing CAR to cluster

Related topics