Kubo versions > 0.15.0 gateway and slow IPNS performance

eth-limo · February 20, 2023, 11:07pm

Hello IPFS community,

We’re testing Kubo upgrades in our dev environment. When upgrading to anything above v0.15.0 we’ve noticed that gateway requests for IPNS names now take anywhere from 3-4m to resolve. Rolling back to v0.15.0 resolves the issue.

In the release notes for v0.16.0 and above, Kubo now ignores IPNS v1 signatures and is only compatible with v2:

Records that do not have a valid IPNS V2 signature, or exceed the max size
limit, will no longer pass verification, and will be ignored by Kubo when
resolving /ipns/{libp2p-key} content paths.

Could this be related to the slow performance we’re seeing with newer versions of Kubo?

The slow behavior can be re-created by running Kubo > 0.15.0 with the following:

$ ipfs config --json DNS.Resolvers '{"eth.": "https://dns.eth.limo/dns-query"}'


# Slow IPNS example: /ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9/
$ time curl http://localhost:8080/ipns/simple.eth/ -s -o /dev/null

real	3m0.025s
user	0m0.021s
sys	0m0.010s

# Working IPFS example: /ipfs/bafybeighxhsavoanjqkqvnnpbkvoweurybjt7gauunbg37ueahcbze5ise/
$ time curl http://localhost:8080/ipns/vitalik.eth/ -s -o /dev/null

real	0m0.223s
user	0m0.007s
sys	0m0.006s

The node on which these tests were performed is publicly accessible and works fine when downgrading to Kubo 0.15.0.

Has anyone noticed this or has experience troubleshooting similar issues? Are there any debugging utilities or logs that might shed some insight into what’s responsible for the slowness? A large percentage of ENS dWebsites use IPNS now, so obviously we don’t want to deploy an upgrade that will negatively impact user experience.

Any advice is much appreciated! Thank you.

Discordian · February 21, 2023, 4:26pm

Wow great post, I wish I had a direct answer for you. I do have a question though, are you using IPNS PubSub? And if not, does enabling that improve your situation at all?

eth-limo · February 21, 2023, 4:34pm

Hi @Discordian,

Yes, we do have IPNS Pub/Sub enabled. Here is a snippet from our Systemd unit file:

ExecStart=/usr/bin/ipfs daemon --enable-gc --migrate=true --enable-pubsub-experiment --enable-namesys-pubsub

I have not noticed any improvement using this flag.

Discordian · February 21, 2023, 4:36pm

Wow surprising to me, I found resolution times after the first to go much faster with that enabled.

My next idea is (if you have the resources for it, IIRC this increases your memory requirement) to see how the Accelerated DHT Client fares for you. As I found this to be critical for my VPS node.

eth-limo · February 21, 2023, 4:37pm

We have been using the AcceleratedDHT too!

/usr/bin/ipfs config --json Experimental.AcceleratedDHTClient true

Discordian · February 21, 2023, 4:38pm

Oh my! Perhaps @Jorropo has an idea (or maybe knows who might?). Very sorry for the inconvenience here.

eth-limo · February 21, 2023, 4:39pm

Thank you for looking into this. It’s been quite a headache for us. Hopefully we can get to the root cause and proceed with our upgrade.

Discordian · February 21, 2023, 5:50pm

Absolutely, I will raise this thread everywhere I can to try to get you answers

ylempereur · February 21, 2023, 6:11pm

Here is a bug I reported back in September that may (or may not) be related. The fact it happened for the same version and has to do with discovery when using the accelerated DHT is what makes me think they could be related.

github.com/ipfs/kubo

0.16.0-rc1: AcceleratedDHTClient possible issue?

opened 09:34PM - 26 Sep 22 UTC

ylempereur

kind/bug status/in-progress P1

### Checklist - [X] This is a bug report, not a question. Ask questions on [dis…cuss.ipfs.io](https://discuss.ipfs.io). - [X] I have searched on the [issue tracker](https://github.com/ipfs/kubo/issues?q=is%3Aissue) for my bug. - [X] I am running the latest [kubo version](https://dist.ipfs.tech/#kubo) or have an issue updating. ### Installation method ipfs-update or dist.ipfs.tech ### Version ```Text Kubo version: 0.16.0-rc1 Repo version: 12 System version: amd64/darwin Golang version: go1.19.1 ``` ### Config ```json "ResourceMgr": { "Enabled": false }, ``` ### Description Compared to previous versions (0.15 and prior), the hourly network scan done by the AcceleratedDHTClient causes a much higher CPU spike and finds far fewer peers (as reported by `ipfs stats dht wan`, 1K-3K vs 8K-10K, around the same time). Something has changed (not for the better), and I made sure to turn off ResourceMgr in both cases. Please advise.

Jorropo · February 21, 2023, 6:20pm

This should not be impacted by the signature change, it’s the same bytes under the hood except we use different fields from the protobuf.
I don’t know without investigating and I don’t really have time right now for this sorry, one thing I would check is if the ipfs dht get command also sees a slow down between 0.15 and 0.15+ (the point is to see if the slow down are in the dht or ipns codepaths).

eth-limo · February 21, 2023, 10:12pm

Here are some comparisons between each version and whether or not the AcceleratedDHT client is enabled. I’m happy to provide additional diagnostics.

v18.0.1

AcceleratedDHT: false

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9 | grep says | wc -l
27

real	0m17.181s
user	0m0.009s
sys	    0m0.039s

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5djwbl0zcd4g9onue26a8nq97c0m9wp6kir1gibuyjxpkqpoxwag | grep says | wc -l
0

real	0m0.429s
user	0m0.018s
sys	    0m0.022s

AcceleratedDHT: true

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9 | grep says | wc -l
10

real	0m0.300s
user	0m0.012s
sys	    0m0.007s

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5djwbl0zcd4g9onue26a8nq97c0m9wp6kir1gibuyjxpkqpoxwag | grep says | wc -l
0

real	0m0.085s
user	0m0.008s
sys	    0m0.009s

v15.0.0

AcceleratedDHT: true

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9 | grep says | wc -l
10

real	0m0.581s
user	0m0.008s
sys	    0m0.008s

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5djwbl0zcd4g9onue26a8nq97c0m9wp6kir1gibuyjxpkqpoxwag | grep says | wc -l
0

real	0m0.133s
user	0m0.004s
sys	    0m0.014s

AcceleratedDHT: false

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9 | grep says | wc -l
88

real	0m20.587s
user	0m0.001s
sys	    0m0.080s

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5djwbl0zcd4g9onue26a8nq97c0m9wp6kir1gibuyjxpkqpoxwag | grep says | wc -l
0

real	0m0.198s
user	0m0.011s
sys	    0m0.035s

I’m not sure if the full output from ipfs dht get is useful so I’ve omitted it for now. Please let me know and I’ll include it if necessary.

Are there any other diagnostics that could aid in troubleshooting this?

Jorropo · February 21, 2023, 10:56pm

That intresting so it seems to be dht related, I recently merged some fixes for a latency bug in the DHT, but I’m suprised this would effect this, could you please retry with 0.19-rc1 when it is released (planned for this week) ?

eth-limo · February 21, 2023, 11:17pm

Absolutely! I’ll keep an eye on the releases this week and report back with my findings. Thank you.

hector · February 22, 2023, 12:12pm

Can this be a resource manager issue?

Jorropo · February 22, 2023, 12:59pm

Maybe, we would expect the inverse behaviour tho (accelerated slowing down and normal client working more or less as usual).

guseggert · February 23, 2023, 4:11pm

The normal client does more work in a single DHT lookup than the accelerated client, so with no more info it still seems possible that libp2p resource manager is throttling and slowing things down.

Could you try running the same tests on v18.0.1 with resource manager disabled? That’s Swarm.ResourceMgr.Enabled = false. This will at least tell us if it’s RM or not, if it is then we can work on tweaking the RM limits.

eth-limo · February 27, 2023, 3:44pm

Disabling ResourceMgr appeared to partially alleviate some of the poor gateway performance for one of the test cases but the rest are still unusable:

v18.0.1
Swarm.ResourceMgr.Enabled = false

Unreachable/slow IPNS gateway request

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9 | grep says | wc -l
10

real	0m0.371s
user	0m0.010s
sys	    0m0.009s

# Extremely slow
$ time curl -so /dev/null http://localhost:8080/ipns/k51qzi5uqu5dmjnxaa3ed22c0a6xaoyw5d75uacizdpc4u9rmxnqialvwat8y9/

real	3m0.016s
user	0m0.013s
sys	    0m0.006s

Unreachable/slow IPNS gateway request

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5dm7gcgw5wxlf81aszjbihejohdv6pjewxwhtsjoimufognxl8fz | grep says | wc -l
10

real	0m0.324s
user	0m0.008s
sys	    0m0.009s

# Extremely slow
$ time curl -so /dev/null http://localhost:8080/ipns/k51qzi5uqu5dm7gcgw5wxlf81aszjbihejohdv6pjewxwhtsjoimufognxl8fz/

real	3m0.122s
user	0m0.006s
sys	    0m0.012s

Successful IPNS gateway request

$ time ipfs dht get -v -- /ipns/k51qzi5uqu5djwbl0zcd4g9onue26a8nq97c0m9wp6kir1gibuyjxpkqpoxwag | grep says | wc -l
0

real	0m0.124s
user	0m0.003s
sys	    0m0.018s

# Noticeable improvement
$ time curl -so /dev/null http://localhost:8080/ipns/k51qzi5uqu5djwbl0zcd4g9onue26a8nq97c0m9wp6kir1gibuyjxpkqpoxwag/

real	0m0.216s
user	0m0.000s
sys	    0m0.010s

I’m not really sure why one of the test sites is able to be retrieved but the others are not. I’ve got plenty of additional examples if those would be helpful.

eth-limo · May 16, 2023, 8:33pm

Just tried again with v0.20.0 and got the same results - IPNS performance is just incredibly slow. @Jorropo do you have any other suggestions that I can try?

edit - I should mention that almost every single IPNS gateway request completes approximately in 1m. Is this some kind of configurable timeout?

0xc0de4c0ffee · June 1, 2023, 7:09am

@eth-limo,
I’m still not sure what’s causing this issues back to testing multiple gateways, services and versions.

IPNS published from web3.storage/w3name service

dweb.link (504)
https://k51qzi5uqu5dmgi2q98r638vet29ef6up32m3rixduu32urkurm6kj4rq8hufy.ipns.dweb.link
.ipfs.io (504)
https://ipfs.io/ipns/k51qzi5uqu5dmgi2q98r638vet29ef6up32m3rixduu32urkurm6kj4rq8hufy
ipfs2.eth.limo https://k51qzi5uqu5dmgi2q98r638vet29ef6up32m3rixduu32urkurm6kj4rq8hufy.ipfs2.eth.limo
.cf-ipfs.com
https://k51qzi5uqu5dmgi2q98r638vet29ef6up32m3rixduu32urkurm6kj4rq8hufy.ipns.cf-ipfs.com

IPNS published from https://dwebservices.xyz/

I’m update this with results from local/browser Helia nodes and Fleek/other services soon…

eth-limo · June 1, 2023, 9:57pm

Testing again with v0.20.0 without --enable-pubsub-experiment and --enable-namesys-pubsub seemed to do the trick!

time ipfs resolve -r /ipns/k51qzi5uqu5djhw4rskl5h4ylujy8cwbhn3bcmgwxjzclrebis7uusb1gnfnmo/
/ipfs/QmXGSRcFfVhvwepRZFbg97AkTRo6kt23UDCiMYLcEzv8FC

real	0m0.704s
user	0m0.018s
sys	   0m0.033s

I followed the instructions here for debugging. I found many instances of hung goroutines relating to these flags:

goroutine 1435999 [select, 1542 minutes]:
github.com/libp2p/go-yamux/v4.(*Stream).Read(0xc00baae2a0, {0xc0093a0630, 0x1, 0x1})
        github.com/libp2p/go-yamux/v4@v4.0.0/stream.go:111 +0x1a9
github.com/libp2p/go-libp2p/p2p/muxer/yamux.(*stream).Read(0xe7707c?, {0xc0093a0630?, 0x100c00bc1d2c0?, 0x203000?})
        github.com/libp2p/go-libp2p@v0.27.3/p2p/muxer/yamux/stream.go:17 +0x1e
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Stream).Read(0xc00976a700, {0xc0093a0630?, 0x80?, 0x7fad1e107430?})
        github.com/libp2p/go-libp2p@v0.27.3/p2p/net/swarm/swarm_stream.go:55 +0x33
github.com/multiformats/go-multistream.(*lazyClientConn[...]).Read(0xc0047f5ea0?, {0xc0093a0630?, 0x1?, 0x1?})
        github.com/multiformats/go-multistream@v0.4.1/lazyClient.go:68 +0xb8
github.com/libp2p/go-libp2p/p2p/host/basic.(*streamWrapper).Read(0xc0078c1530?, {0xc0093a0630?, 0x3c500?, 0xc710302babf?})
        github.com/libp2p/go-libp2p@v0.27.3/p2p/host/basic/basic_host.go:1156 +0x27
github.com/libp2p/go-libp2p-pubsub.(*PubSub).handlePeerDead(0x0?, {0x2c0f6c0, 0xc0084fd640})
        github.com/libp2p/go-libp2p-pubsub@v0.9.3/comm.go:147 +0x86
created by github.com/libp2p/go-libp2p-pubsub.(*PubSub).handleNewPeer
        github.com/libp2p/go-libp2p-pubsub@v0.9.3/comm.go:128 +0x398

goroutine 586 [select, 89 minutes]:
github.com/ipfs/boxo/namesys/republisher.(*Republisher).Run(0xc001100870, {0x2c060b8, 0xc0012a8480})
        github.com/ipfs/boxo@v0.8.1/namesys/republisher/repub.go:79 +0x12a
github.com/jbenet/goprocess.(*process).Go.func1()
        github.com/jbenet/goprocess@v0.1.4/impl-mutex.go:134 +0x36
created by github.com/jbenet/goprocess.(*process).Go
        github.com/jbenet/goprocess@v0.1.4/impl-mutex.go:133 +0x238

I’m not entirely sure what changed starting in v0.16.0 with regards to these features. I did find an open issue regarding the deprecation of namesys/pubsub in general, so it might be related. Thank you to everyone who assisted in troubleshooting this problem. I’m more than happy to provide additional findings if this can help anyone else.

Topic		Replies	Views
Unable to resolve IPNS name via remote IPFS nodes or public gateways (used to work fine) Help	15	632	March 13, 2024
IPNS Stopped Resolving Help ipns	10	628	June 13, 2023
Confusion about IPNS Help ipns	4	5657	November 16, 2017
Understanding guarantees of the improved IPNS over pubsub Help ipns	15	2356	September 7, 2020
Why ipns is so slow Kubo	10	6108	April 23, 2018

Kubo versions > 0.15.0 gateway and slow IPNS performance

Related topics