go-ipfs HTTP gateway connection resets
We’ve been having an issue that’s been making go-ipfs just about unusable as an HTTP gateway. We’re seeing connection resets that seem to occur intermittantly between OpenResty and go-ipfs. We don’t know hat triggers it as it doesn’t align with anything we can tell(load, resource usage, etc). The setup looks kind of like this:
[Kubernetes LoadBalancer] -> OpenResty(+lua-resty-auto-ssl) -> [Kubernets NodePort Service] -> go-ipfs HTTP gateway port (8080)
It looks like the connection reset happens between Resty and go-ipfs and we see logs like these from the Resty instance:
2020/07/31 17:35:58 [error] 20#20: *56 recv() failed (104: Connection reset by peer) while sending to client, client: 10.138.0.12, server: , request: "GET /app.ace7f8e8.css HTTP/1.1", upstream: "http://10.11.250.149:8080/app.ace7f8e8.css", host: "example.my.app"
2020/07/31 17:36:44 [error] 20#20: *74 recv() failed (104: Connection reset by peer) while sending to client, client: 10.8.0.1, server: , request: "GET /app.ace7f8e8.css HTTP/1.1", upstream: "http://10.11.250.149:8080/app.ace7f8e8.css", host: "example.my.app"
2020/07/31 17:36:45 [error] 20#20: *77 recv() failed (104: Connection reset by peer) while sending to client, client: 10.138.0.12, server: , request: "GET /app.ace7f8e8.css HTTP/1.1", upstream: "http://10.11.250.149:8080/app.ace7f8e8.css", host: "example.my.app"
2020/07/31 17:36:52 [error] 20#20: *92 recv() failed (104: Connection reset by peer) while sending to client, client: 10.138.0.17, server: , request: "GET /app.ace7f8e8.css HTTP/1.1", upstream: "http://10.11.250.149:8080/app.ace7f8e8.css", host: "example.my.app"
They seem to happen randomly and the files get cut off at random points during the transfer. Here’s a sampling of the size returned fo the truncated file(actual size: 214465):
52571
56667
56584
52488
157824
162048
The go-ipfs daemon debug logs so far don’t seem to show anything relevant. What modules should we pay attention to in go-ipfs debug logging? Anyone see this before or have any ideas for further debugging to help get to the bottom of this?