I recently encounter this error after operating IPFS for a few weeks. It just occurred recently (we do not have any updates recently). Everything was working fine from the beginning.
This error means that block/put hangs and timeouts out on IPFS. Perhaps ipfs is in the middle of a long GC round, or it is so busy that it cannot write blocks to disk. An ipfs error in any case.
Cluster timeouts for ipfs operations are adjustable in the cluster config (ipfshttp section), but I think defaults are more than enough.
That error is probably unrelated. I’m thinking perhaps the request is being aborted and that causes the context.Cancelled. Do you have nginx in front of the cluster API or something like that?
We have an API gateway for our backend APIs. The backend internally makes requests to the cluster API. The cluster API here is just exposed internally (only the backend can make requests to it).
Thanks for the hint. Let me test it further. But personally I do not think this is the reason.
If you come up with any idea, please let me know. Many thanks Hector.
Do they fail immediately? Do they hang and fail later? Do they fail when block/putting the first block, a random block, the last block? Is it always when “finalizing”?
I have trouble that your log says “error when finalizing”, but that is just doing the pin and I believe the context is already cancelled, and that can only happen if the request died.
You can run ipfs-cluster-service --log-level ipfshttp:debug,adder:debug daemon and get more info in the logs.
Also, make sure your application calling the /add endpoint is prepare to both write the request and read responses at the same time. Nginx breaks the moment the server emits a response, aborting requests that have not fully sent the multiparts.
Do they fail immediately? Do they hang and fail later? Do they fail when block/putting the first block, a random block, the last block? Is it always when “finalizing”?
It fails immediately (right after the /add request), and the error messages are always the ones I screenshot. The errors are the same for all request failures.
======
Some new info: We got some unpinned CIDs and the cluster keeps retrying pinning them without any success (this also happened recently, at the same time with all the errors in this posts. Everything was fine at the beginning). Only have logs ERROR core/commands/cmdenv pin/pin.go:133 context canceled on the IPFS side. No error logs on the cluster side
The error seems to come from the DNS setup. We are checking it further. Really thanks for the hint.
====
I have a new question. Currently we receive different API response for the same API call (/add request). Wonder what config/setup that results in the following differences.