Kubo (go-ipfs) closes db randomly

reshetovdenis · March 9, 2024, 9:18pm

I use ipfs/go-ipfs:latest to serve IPFS data to helia client. The server runs under Docker. Sometimes I’m getting strange behavior of the server: it stops deliver data to Helia. When I’m looking to datastore/LOG I see the following at the time server stops delivering the data:
09:45:04.523321 db@close closing
09:45:04.523681 db@close done T·334.55µs
It’s actually strange as the Docker still runs at the time. When I rerun it the error disappears but then appears at random time.

Here is my configuration:
Dockerfile.ipfs

# Dockerfile.ipfs
FROM ipfs/go-ipfs:latest

# Initialize IPFS with the server profile
RUN ipfs init --profile=server && \
    # Allow all origins for API access
    ipfs config --json API.HTTPHeaders.Access-Control-Allow-Origin '["*"]' && \
    # Set the maximum storage limit to 40GB
    ipfs config Datastore.StorageMax "40GB"

docker-compose.yml

version: '3.7'

services:
  ipfs:
    build: 
      context: .
      dockerfile: Dockerfile.ipfs
    container_name: ipfs_my_site_com
    volumes:
      - $HOME/ipfs/export:/export
      - $HOME/ipfs/data:/data/ipfs
    environment:
      - VIRTUAL_HOST=my.site.com
      - VIRTUAL_PORT=5001
      - LETSENCRYPT_HOST=my.site.com
      - LETSENCRYPT_EMAIL=my.email@gmail.com
    restart: unless-stopped
    command: ["daemon", "--migrate=true", "--enable-gc"]
networks:
  default:
    external:
      name: nginx-proxy

hector · March 11, 2024, 9:45am

Your go-ipfs must be shutting down. I see no other reason to attempt to close the DB. Perhaps someone is trying to kill your container (OOM?).

reshetovdenis · March 16, 2024, 10:43pm

Thank you very much for the reply. I’m not sure about the previous case as I din’t check if ipfs daemon was running, but the similar issue appeared yesterday and however in LOG was not any note about DB closing and daemon was running, helia wasn’t able to connect to the go-ipfs.
Here is a most recent log before I restarted container and helia reconnected:

WARN[0000] network default: network.external.name is deprecated in favor of network.name 
ipfs_slonig_org  | Changing user to ipfs
ipfs_slonig_org  | ipfs version 0.27.0
ipfs_slonig_org  | Found IPFS fs-repo at /data/ipfs
ipfs_slonig_org  | Initializing daemon...
ipfs_slonig_org  | Kubo version: 0.27.0-59bcea8
ipfs_slonig_org  | Repo version: 15
ipfs_slonig_org  | System version: amd64/linux
ipfs_slonig_org  | Golang version: go1.21.7
ipfs_slonig_org  | 2024/03/15 01:00:03 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
ipfs_slonig_org  | Swarm listening on /ip4/127.0.0.1/tcp/4001
ipfs_slonig_org  | Swarm listening on /ip4/127.0.0.1/udp/4001/quic-v1
ipfs_slonig_org  | Swarm listening on /ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiDhxeqn95zUHSZKgmxghEW2hALByB5HaKgdfRVcMTsFZQ/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg
ipfs_slonig_org  | Swarm listening on /ip4/192.168.128.2/tcp/4001
ipfs_slonig_org  | Swarm listening on /ip4/192.168.128.2/udp/4001/quic-v1
ipfs_slonig_org  | Swarm listening on /ip4/192.168.128.2/udp/4001/quic-v1/webtransport/certhash/uEiDhxeqn95zUHSZKgmxghEW2hALByB5HaKgdfRVcMTsFZQ/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg
ipfs_slonig_org  | Swarm listening on /p2p-circuit
ipfs_slonig_org  | Swarm announcing /ip4/127.0.0.1/tcp/4001
ipfs_slonig_org  | Swarm announcing /ip4/127.0.0.1/udp/4001/quic-v1
ipfs_slonig_org  | Swarm announcing /ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiDhxeqn95zUHSZKgmxghEW2hALByB5HaKgdfRVcMTsFZQ/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg
ipfs_slonig_org  | Swarm announcing /ip4/192.168.128.2/tcp/4001
ipfs_slonig_org  | Swarm announcing /ip4/192.168.128.2/udp/4001/quic-v1
ipfs_slonig_org  | Swarm announcing /ip4/192.168.128.2/udp/4001/quic-v1/webtransport/certhash/uEiDhxeqn95zUHSZKgmxghEW2hALByB5HaKgdfRVcMTsFZQ/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg
ipfs_slonig_org  | Swarm announcing /ip4/65.109.58.6/udp/4001/quic-v1
ipfs_slonig_org  | Swarm announcing /ip4/65.109.58.6/udp/4001/quic-v1/webtransport/certhash/uEiDhxeqn95zUHSZKgmxghEW2hALByB5HaKgdfRVcMTsFZQ/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg
ipfs_slonig_org  | RPC API server listening on /ip4/0.0.0.0/tcp/5001
ipfs_slonig_org  | WebUI: http://0.0.0.0:5001/webui
ipfs_slonig_org  | Gateway server listening on /ip4/0.0.0.0/tcp/8080
ipfs_slonig_org  | Daemon is ready
ipfs_slonig_org  | 2024/03/15 01:20:32 websocket: failed to close network connection: close tcp 192.168.128.2:39916->147.75.87.27:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 01:20:32 websocket: failed to close network connection: close tcp 192.168.128.2:39904->147.75.87.27:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 02:36:08 websocket: failed to close network connection: close tcp 192.168.128.2:49330->192.227.67.185:4002: use of closed network connection
ipfs_slonig_org  | 2024/03/15 06:20:21 websocket: failed to close network connection: close tcp 192.168.128.2:37438->139.178.91.71:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 06:35:34 websocket: failed to close network connection: close tcp 192.168.128.2:41596->139.178.88.95:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 06:50:36 websocket: failed to close network connection: close tcp 192.168.128.2:44002->139.178.91.71:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 07:05:53 websocket: failed to close network connection: close tcp 192.168.128.2:49284->139.178.88.95:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 17:25:54 websocket: failed to close network connection: close tcp 192.168.128.2:54096->192.227.67.185:4002: use of closed network connection
ipfs_slonig_org  | 2024/03/15 17:52:52 websocket: failed to close network connection: close tcp 192.168.128.2:50586->139.178.88.95:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 18:22:09 websocket: failed to close network connection: close tcp 192.168.128.2:36062->139.178.88.95:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 18:35:42 websocket: failed to close network connection: close tcp 192.168.128.2:46374->139.178.91.71:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 18:51:37 websocket: failed to close network connection: close tcp 192.168.128.2:46062->139.178.88.95:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 19:21:30 websocket: failed to close network connection: close tcp 192.168.128.2:55856->139.178.88.95:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 19:39:50 websocket: failed to close network connection: close tcp 192.168.128.2:38258->147.75.87.27:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 19:40:22 websocket: failed to close network connection: close tcp 192.168.128.2:56382->139.178.91.71:443: use of closed network connection
ipfs_slonig_org  | 2024/03/15 21:37:03 websocket: failed to close network connection: close tcp 192.168.128.2:58404->147.75.87.27:443: use of closed network connection

hector · March 19, 2024, 5:14pm

As I say, I think the container is getting killed from the outside.

reshetovdenis · March 19, 2024, 7:28pm

How it could be killed, if I can login to it and see ipfs daemon running?

hector · March 19, 2024, 8:58pm

Because docker automatically restarts the container?

hector · March 19, 2024, 8:59pm

How much memory is available in this machine?

reshetovdenis · March 19, 2024, 9:03pm

Why if Docker restarts I can’t connect to if via Helia, and when I restart in manually I can?
Here is a memory data:

free -g -h -t
               total        used        free      shared  buff/cache   available
Mem:            62Gi        10Gi       554Mi       8.0Mi        51Gi        51Gi
Swap:           31Gi       623Mi        31Gi
Total:          94Gi        11Gi        31Gi

hector · March 19, 2024, 9:48pm

Is it possible to do not let docker restart and get the logs before it actually dies? Is it running out of file descriptors? Do the system logs say anything around that time? Is docker actually restarting it? Can you check the containers uptime?

reshetovdenis · March 20, 2024, 8:19am

I’ve collected the following stat:

docker exec -it ipfs_slonig_org /bin/sh
date
Fri Mar 15 22:13:29 UTC 2024

top
Mem: 63651296K used, 2106104K free, 8204K shrd, 4135340K buff, 42111900K cached
CPU:  0.2% usr  0.1% sys  0.0% nic 99.4% idle  0.1% io  0.0% irq  0.0% sirq
Load average: 1.01 0.83 0.86 1/1711 61311
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    7     1 ipfs     S    8238m 12.4   3  0.0 ipfs daemon --migrate=true --enable-gc
61222     0 root     S     4404  0.0   8  0.0 /bin/sh
61311 61222 root     R     4404  0.0   7  0.0 top
    1     0 root     S     2472  0.0  11  0.0 /sbin/tini -- /usr/local/bin/start_ipfs daemon --migrate=true --enable-gc

As you can see “ipfs daemon” was running about 6 days until the helia stopped connecting.
Could you recommend me which log data to collect? I use Ubuntu 22.04

hector · March 22, 2024, 3:05pm

Isn’t that supposed to just show the current date?

Your docker host must have a syslog (journalctl). What does it mean that helia stops connecting though? What error? What is docker ps saying about the container?

reshetovdenis · March 22, 2024, 3:47pm

I mean that “8238m 12.4 3 0.0 ipfs daemon” shows that the container is running for a prolong time and wasn’t restated. journalctl didn’t show any notable info at the time. I din’t run docker ps last time, but as I’ve mentioned “top” at the container has shown that the container was not just restarted.
“helia stops connecting” means that get requests take more than 1 min and don’t finish.

hector · March 22, 2024, 3:53pm

Sorry, nothing in that line indicates how long the program has been running…

reshetovdenis · March 22, 2024, 4:00pm

Sorry, my bad. In that way let’s wait for next time when I will face the same error and I can check “docker ps”

reshetovdenis · April 5, 2024, 8:15am

I got the same issue today. This is what docker shows:
docker compose ps

NAME                COMMAND                  SERVICE             STATUS                PORTS
ipfs_slonig_org     "/sbin/tini -- /usr/…"   ipfs                running (unhealthy)   4001/tcp, 5001/tcp, 8080-8081/tcp, 4001/udp

docker compose logs

ipfs_slonig_org  | Changing user to ipfs
ipfs_slonig_org  | ipfs version 0.27.0
ipfs_slonig_org  | Found IPFS fs-repo at /data/ipfs
ipfs_slonig_org  | Initializing daemon...
ipfs_slonig_org  | Kubo version: 0.27.0-59bcea8
ipfs_slonig_org  | Repo version: 15
ipfs_slonig_org  | System version: amd64/linux
ipfs_slonig_org  | Golang version: go1.21.7
ipfs_slonig_org  | 2024/04/05 01:00:03 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
ipfs_slonig_org  | Swarm listening on /ip4/127.0.0.1/tcp/4001
ipfs_slonig_org  | Swarm listening on /ip4/127.0.0.1/udp/4001/quic-v1
ipfs_slonig_org  | Swarm listening on /ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg/certhash/uEiAWxfr0-ocKKygw256kD1bGuZZSDWcBps4z_Pg4_u_waA
ipfs_slonig_org  | Swarm listening on /ip4/192.168.128.2/tcp/4001
ipfs_slonig_org  | Swarm listening on /ip4/192.168.128.2/udp/4001/quic-v1
ipfs_slonig_org  | Swarm listening on /ip4/192.168.128.2/udp/4001/quic-v1/webtransport/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg/certhash/uEiAWxfr0-ocKKygw256kD1bGuZZSDWcBps4z_Pg4_u_waA
ipfs_slonig_org  | Swarm listening on /p2p-circuit
ipfs_slonig_org  | Swarm announcing /ip4/127.0.0.1/tcp/4001
ipfs_slonig_org  | Swarm announcing /ip4/127.0.0.1/udp/4001/quic-v1
ipfs_slonig_org  | Swarm announcing /ip4/127.0.0.1/udp/4001/quic-v1/webtransport/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg/certhash/uEiAWxfr0-ocKKygw256kD1bGuZZSDWcBps4z_Pg4_u_waA
ipfs_slonig_org  | Swarm announcing /ip4/192.168.128.2/tcp/4001
ipfs_slonig_org  | Swarm announcing /ip4/192.168.128.2/udp/4001/quic-v1
ipfs_slonig_org  | Swarm announcing /ip4/192.168.128.2/udp/4001/quic-v1/webtransport/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg/certhash/uEiAWxfr0-ocKKygw256kD1bGuZZSDWcBps4z_Pg4_u_waA
ipfs_slonig_org  | Swarm announcing /ip4/65.109.58.6/udp/4001/quic-v1
ipfs_slonig_org  | Swarm announcing /ip4/65.109.58.6/udp/4001/quic-v1/webtransport/certhash/uEiDGlzl6dkqqkkj6AHRc0gGLUJlCvl6xCnqR_WYhP1SDJg/certhash/uEiAWxfr0-ocKKygw256kD1bGuZZSDWcBps4z_Pg4_u_waA
ipfs_slonig_org  | RPC API server listening on /ip4/0.0.0.0/tcp/5001
ipfs_slonig_org  | WebUI: http://0.0.0.0:5001/webui
ipfs_slonig_org  | Gateway server listening on /ip4/0.0.0.0/tcp/8080
ipfs_slonig_org  | Daemon is ready
ipfs_slonig_org  | 2024/04/05 07:15:09 websocket: failed to close network connection: close tcp 192.168.128.2:45722->147.75.87.27:443: use of closed network connection

hector · April 5, 2024, 8:49am

What does docker ps say the container uptime was?

The logs you are pasting are not timestamped, I cannot know if it started long ago and then gave an error or if it gave an error days after starting.

reshetovdenis · April 5, 2024, 8:54am

90ed1a7331ca   ipfsslonigorg_ipfs                                   "/sbin/tini -- /usr/…"   7 hours ago     Up 7 hours (unhealthy)   4001/tcp, 5001/tcp, 8080-8081/tcp, 4001/udp                                                                                                                                                                      ipfs_slonig_org

hector · April 6, 2024, 7:56pm

Your container has only lived 7 hours so it was likely restarted after dying? Or did you restart it? I think you need to get more familiar with running applications with docker and system administration, I’m sure there is some log in your system that shows the issue if there is any.

And what do you mean by “stops delivering data to helia”? You are also perhaps running some nginx proxy thing on top. You have sent several unrelated errors too as the " closing db" thing hasn’t been seen again. The other errors are not fatal afaik but you don’t even tell if it caused the process to die.

I have no magic ball but I think the problem, if any, is your application, your proxy or your setup and if you cannot make basic troubleshooting like actually showing logs of the moment Kubo dies it’s going to be very difficult to help you further.

reshetovdenis · April 6, 2024, 8:09pm

Hector, thank you very much for the prompt reply. The container lived 7h because of planned backup operation - I stop it each midnight. Could you provide more clear info which system log would you like to see?

Topic		Replies	Views
Kubo hogging CPU and errors in docker logs	3	218	July 27, 2022
Ipfs-cluster: Unable to start a docker container Help ipfs-cluster	2	1112	October 31, 2017
Ipfs daemon Take memory is higher Help	0	260	June 17, 2019
Leveldb closed error after updating ipfs to 0.8 from 0.7	7	1267	April 6, 2021
V0.4.0 - Unable to pin large directories Help	33	2190	May 23, 2017

Kubo (go-ipfs) closes db randomly

Related topics