Been trying to get helia to connect directly to our nodes via websockets and request data but it doesn’t seem to work. My idea here was to reject all connections except for our nodes and then only support bitswap to ensure traffic is routed only through websockets. It doesn’t appear to work at all.
I created an MVP here to show it.
You can do pnpm install
and then pnpm dev
You will notice a blank screen on localhost:3000. If you remove the bitswap config only on blockBrokers: [bitswap()]
, you can see it load the stac metadata like normal through http connection. Any ideas here?
Oh forgot to add a couple things.
- with kubo local node when swarm connected to the 3 nodes you can pull the cid (helia isn’t doing this)
- when using the cid check, the cid exists on the node.
Link for easy viewing
3 Likes
To get this working, the TLDR is to disable bitswap sessions:
const response = await verifiedFetchFn(
"ipfs://baguqeeraxym2g5iaecqzqc6uekitjaxn47dncq3o2mt36wqbqjt2q3ci42dq", {
session: false
});
This will send your wantlist to all connected peers - if the bootstrap nodes have the content they should then supply it.
The slightly longer answer is that by default Bitswap is very chatty - you send your want list to every peer, sometimes multiple times when new wants are added to it. They respond that they don’t have it, and you have to notify them when you no longer want it.
Typically most of your peers won’t have the data you are after, so sessions are designed to mitigate this by only sending your wantlist to a subset of your peers - those that have already been resolved as providers of the root CID of the DAG being fetched (more peers will be added to the session if the current session peers do not have a block).
The initial and subsequent session peers are selected by running a findProviders
routing query.
Because the connection gater has been configured to deny all connections that aren’t to your bootstrap nodes, the query fails as the browser node can’t traverse the network.
However since the bootstrap nodes have the data, they should reply that they are providers - I checked the CID being loaded on check.ipfs.network using one of your bootstrap nodes to start the test - it couldn’t resolve the provider record so I guess you’re not publishing it? If this is the case the findProvider
query will always fail.
I guess internally we could improve this by using the currently connected bitswap peers as the starting point for the session, but you should also publish a provider record for the CID if you want the content to be resolvable.
1 Like
Thanks for looking into this!
that helps clarify things bit. I do know we were doing a lot of “ipfs pin”. Had assumed prior that it would be broadcasting to dht with that but its clear we also need to do “ipfs add”.
I was looking up the options for createHelia and subsequently the createLibp2p options for this and I don’t see session as being one. Can this setting be configured in helia?
We use a hamt structure on most of our calls so a single failure in that chain slows it down alot which is why we were going the route of preventing all connections except our own for now. Ideally minimize any wasted calls. What you touched on by having some sort of priority would be really beneficial here.
We found it worked fine using local Kubo nodes without having to do anything. Does that suggest there is a small difference in how helia operates? The Kubo nodes wouldn’t be trying to access the dht then since it worked fine but helia seems to prioritize the dht vs currently connected peers.
1 Like
@achingbrain Looking more into this, I found some places where there is a convergence. Correct me if I am wrong.
Helia takes a more router centric approach where it asks the routers for information on a specific CID. It waits until they return and then does a parallel request to resolve whichever is quicker. It does not ask its connected peers in this process even though for us, those connected peers 100% have the data. So we are forced to publish to the DHT and accept peer connection we don’t really care about and peer connections we know won’t have it. With our hamt structure, any failures slows it down.
On the other side, it seems Kubo sends Bitswap WANTs to already connected peers while it then in parallel does a look up for providers on the DHT. So here Kubo leverages connected peers while Helia relies only on the DHT and routers. Is there a possibility that the Bitswap module can also send WANTs at the same time it tries to do a DHT look up? Then we can from our perspective optionally disable that DHT gateway and just force connect to our peers.
I see there is issues with publishing millions of CIDs to the DHT so ideally we can get this working in the case the DHT fails and we can still revert to direct connections.
1 Like
Is there a possibility that the Bitswap module can also send WANTs at the same time it tries to do a DHT look up?
Yes, I think we could definitely add something like this to be more flexible.
1 Like
i will look into what that entitles and maybe submit a pr assuming you don’t have it on your near term horizon yet. Any tips on what I can focus on?
1 Like
Great news is I got it working to bypass the DHT. With a direct node connection using bitswap over websockets im seeing 8MB/s. 365MB fetched in 46 seconds with an immediate second request fetching it in 24 seconds. (maybe some local caching). EIther way this is working much better. Previously it was in the range of KB/s. My current implementation is to prefer connected peers, and reject everything but our own nodes with a fallback that kicks in after 10 seconds to ask the DHT but that never happens.
1 Like
will submit a PR when I have it cleaned up
1 Like
Thinking about it more, another way to do this is just to configure a router that always yields your bootstrap nodes as providers for any given CID - that way you can still use sessions, you don’t have to worry about remaining connected to the bootstrappers & you don’t even have to configure them as bootstrap nodes:
import { createHelia } from 'helia'
import { multiaddr } from '@multiformats/multiaddr'
import { peerIdFromString } from '@libp2p/peer-id'
helia = await createHelia({
// other config
routers: [{
async * findProviders() {
yield * [
FLUORINE_WEBSOCKETS,
BISMUTH_WEBSOCKETS,
CERIUM_WEBSOCKETS
].map(ma => {
const address = multiaddr(ma)
const id = peerIdFromString(address.getPeerId() ?? '')
return {
id,
multiaddrs: [
address
]
}
})
}
}]
})
1 Like
I think now our goal with this PR is to ask any connected peer instead of just bootstrapped peers now for improved resiliency