All-in-one Docker image with IPFS node best practices

boris · January 5, 2024, 6:51pm

I’ve been thinking about an IPFS node/gateway for getting started usage, with all the best practices built in.

Here’s what I think this looks like:

Kubo set up in a best practices way with config out of the box
clear instructions on either open gateway, or just hosting content / websites you have added to the node
nopfs setup for open gateway mode
nginx reverse proxy / caching setup
Saturn fits in here somehow? And/or Station? cc @hannahhoward
a skeleton config of hosting your own _dnslink based apps / websites – e.g. I should be able to CNAME a domain, add a _dnslink TXT record, point it at this gateway, and then example.com gets served up via the gateway (automated updates of DNS can be done out of band using the API of your DNS provider)

I’m unclear what an out-of-the-box pinning API turned on looks like. The use case is your own pinning and publishing from remote systems, IPFS Desktop, etc.

I mean, obviously, I’d love something UCAN based / writeable gateway type stuff, but what’s the simplest way to securely set this up out of the box? Or do we just punt on this and have people connect to the server backend in someway? something something Tailscale connection to a cloud box that I connect my own IPFS Desktop to.

I’d love to have something like this and am willing to help crowdfund it through Open Collective – and maybe Wovin team @tennox & Joshua would take this on.

I’d love to be able to deploy this to e.g. Cloudron, Synology NAS, or a variety of hosting providers.

Please add your own thoughts, and especially pointers to any docker images that might be starting points for this.

walkah · January 5, 2024, 7:02pm

Don’t need this as long as you’re using v0.24 or later

boris · January 5, 2024, 7:28pm

Great! Ideally documented and configured and connected into whatever shared block list governance is going so people know how to use it. This is likely some docs around this.

Jorropo · January 5, 2024, 8:12pm

It’s unclear to me why theses would be best practice or why one single solution is a good thing to begin with.

There aren’t any best practice settings in Kubo’s config you can turn on by default, the default config isn’t great but all the ways to make it better has tradeoffs, else we would just do it. (I might be wrong, but then again, we would fix it in Kubo directly)

If all you care about is running a gateway a dedicated piece of code that just do that (Rainbow cough cough) is about half as expensive to run while being faster.

There are no Saturn integrations in Kubo today, I don’t see what such integration would be doing given Kubo already do P2P thing.
Like if the content I want to download is on your computer, and I can talk with your computer directly, why would I want to ask saturn which is then gonna ask you ?
I guess saturn could be faster if the content is already cached but there are many implementation details which makes saturn unaddapted here.
Mainly the lack of content routing, I can’t easily know which saturn server host the content I want. I could send hundreds of HTTP requests to all the saturn nodes in my local region but then I would have issues where I have so many connections saturn would compete for bandwidth with my own download.

None of this can’t be fixed, random example saturn could use gossipsub to synchronise bloomfilters of content they host, and if I try to download content from a saturn node which doesn’t have it they fetch it from an other saturn node (or they 302 me).
None of that has any amount of code ready to be integrated in an all-in-one docker.

I can see saturn being usefull from browsers which have limited P2P capabilities, even if we can use webtransport and webrtc to do browser P2P saturn which run Go, C++ and Rust and stream the result to the browser could still be faster.

Imo the scope you define is way too broad and coupling it together in one all-in-one docker would make it worst, instead I would like to see a specialized dockers:

gateway docker which can use something like rainbow plus nginx.
pinning service which has whatever upload API you want (ucan, POST .car if you think this is best) which only runs a bitswap server and a non recursive trustless gateway and use IPNI.
Transient content adder docker (a small piece of code that read a file and push a .car to your pinning service docker in a streaming fashion).
…

I’m not sure why this even needs to be a docker, that a shell script of a couple of lines long.

TL;DR

I’m for:

Less is more
microservices-ish
unix philosophy
Whatever you want to call it.

I don’t think an all-in-one docker is a good solution.

If you really want to make an all-in-one docker based on Kubo the prerequisites should be to remove the “best practice” knobs from the config.
Main problem currently AcceleratedDHTClient should be removed and replaced with a reprovide sweep using an on disk multihash sorted database, this would have the same providing performance while having a fraction of the AcceleratedDHTClient CPU and Memory costs.

This means you could host more than a couple of MiBs while being able to provide them on the DHT so other nodes can find it.
probelab.io has the qualified engineers to implement this feature and they accept contracts

boris · January 5, 2024, 8:51pm

Hey Jorropo, thanks for your response.

No, I care about having a solution that bundles up some out of the box solutions for basic use cases, that I can direct people to. I don’t know what Rainbow is, do you have a link for that?

Saturn: yep, no idea if relevant, other than the general use case of not having to add (local) caching or a third party CDN. The website use case I list below in particular would I guess benefit from distributed caching / CDN services.

I have no idea how to run this, and I don’t know where to point people. “This is just a shell script” without enabling me how to do it in a concrete way, isn’t helpful. I know how to do the CNAME stuff.

Does nginx have to be configured in a certain way to respond to a host that is CNAME’d?
There’s probably some Let’s Encrypt stuff needed here to get https too

I’ll try and be less prescriptive and just re-list the goals:

caching enabled so it can serve lots of content without a lot of load (so my idea is nginx reverse proxy rather than serving from Kubo directly – I’m not an expert at this config)
a way to host your own content – so whatever the best config is for uploading / adding that, both individual pieces of content as well as the website use case that’s next
a way to add one or more web addresses with dnslink enabled (e.g. mywebsite1, mywebsite2)
guidelines on open vs closed gateway, and nopfs settings as needed

The target market is a technical user who wants to self host, from Synology NAS / home usage, to small always on cloud host. All-in-one docker is the preferred setup for this market, and also works well for various one-click installs of different hosting options - DO, Hetzner, Hostinger VPS, OVH, Vultr, Linode etc. etc. etc.

There are other hosting options – e.g. FlyIO, Railway, or RenderDotCom PaaS solutions that are a little more 12 Factor than all in one. But that’s a different kind of self hosting.

danieln · January 5, 2024, 9:21pm

There are some great ideas here. A while back I explored this with fly.io and made a guide https://youtu.be/k1Hcg3B43Q4?si=QtUBK44iObXzQn08
There’s also the source code repo linked there with the way the docker image is build and configured.
Cool idea to expand like you laid out and covering dnslink along with web hosting. And data onboarding

As @Jorropo said, DHT advertising is one of the weak spots, which becomes a problem once a repo grows beyond a certain size.

Still, it’s good to have more resources on best practices for a given use-case.

Since you mentioned writeable gateways, I think it’s worth mentioning that we don’t have a broadly adopted data onboarding http endpoint spec. Pinning API spec isn’t broadly adopted either. And with pinning services each having their own approach to onboarding, it makes it hard to get the interop you expect in an open ecosystem. Not to mention that it’s harder encourage client side chunking and CID generation, because some services don’t support uploading cars.

On another note, we should probably mention what Rainbow in the docs and how it’s used to serve the ipfs.io gateway infrastructure.

boris · January 5, 2024, 10:03pm

This is great Daniel! https://github.com/2color/ipfs-deploy-flyio

So yeah, the whole “how do I upload / add” – that Fly proxy is a perfect example of something being a great start for technical users.

I’m sure there’s a Fly <> Tailscale recipe that would allow IPFS Desktop to connect to a hosted Fly node, too. CNAME + additional domain mapping is going to likely be pretty hosting provider specific, but the same set of features or service-specific ones could be figured out.

This is exactly what I’m talking about as a solution to be pointed to. Do you happen to know what the monthly cost of the setup is for something like this? (let’s ignore bandwidth etc – just the “get it up and running persistently”)

Re: writeable gateway / pinning API: the proxy solution is already perfectly fine for a v1. Let’s work on the pieces we have control over, and IPFS Desktop + a hosted node via proxy (and command line level ipfs add this website), it gets things going for individual users.

Out of the four things I mentoned – (1) caching, (2) host your own content, (3) dnslink websites, (4) guidelines – your write up covers (2) which is really the right starting point!

(3) and (4) are mostly config/recipes/writeups.

With like (2)b being “remote” APIs.

Jorropo · January 6, 2024, 12:34am

I’m fine with the goals, but I would rather have you do 4 different dockers than 1.
I don’t get why it has to be 1 docker.

boris · January 6, 2024, 12:40am

Because that’s common practice for the target user in how to deploy it.

If you want this to be easy to install, run, and maintain for self-host users, then a single image for platforms like Cloudron, Yunohost, and Synology NAS is going to get the biggest adoption (there are others, but this is a pretty common pattern).

There doesn’t have to just be one recipe. I’m much more of a fan of 12 Factor Apps and PaaS models. @danieln’s FlyIO example is closer to that, and designing the deployment to where things will be deployed is another method.

For the purposes of this thread, I’m going to look for help in defining and building the single install.

boris · January 6, 2024, 12:40am

Yeah, here’s an incomplete 4 year old project for Yunohost GitHub - YunoHost-Apps/ipfs_ynh: IPFS package for YunoHost

boris · January 6, 2024, 12:41am

Thanks for the link to Rainbow. This is sort of the opposite than what would be useful for a self-hoster, for which an all-in-one “hobby” deploy is a good starting point.

Jorropo · January 6, 2024, 12:50am

I like fly.io but I don’t see why 4 clicks is worst than 1.
This also assumes you need all the features.

If we can make software that cost less to run, is faster and has less bugs by asking the user to do 4 clicks, then I think the user would rather do 4 clicks and have better software than do 1 clicks and have worst one.

boris · January 6, 2024, 12:55am

I’m trying to explain a use case and about a wide range of end users. “The user” uses Cloudron and Yunohost and other platforms that have a particular cost model which costs per Docker image to run. So for both platform compatibility and economic reasons, small scale hosting trends towards single Docker image based hosting.

It is not a cluster mode, a production scale mode, an AWS credits mode.

Can we agree that there are a class of users that have a single Docker image requirement?

Jorropo · January 6, 2024, 12:59am

I wasn’t aware of per docker image costs, in my mind there would be 4 processes sharing same amount of CPU. Ram might be a touch higher since stuff like the libp2p host would be duplicated, but actually due to bad code in bitswap rainbow uses less ram than kubo by doing some gateway exclusive optimisations.

So in mind CPU costs are the same if not lower (since the code is more efficient) and memory would be really similar or better.

What if I ran 4 processes inside 1 docker ? Or did Docker-in-Docker and ran my 4 dockers in your 1 all-in-one docker image ?
The point is that then we can tune each process for each workload, if one of the process has a bug and crashes it doesn’t take down all the other 3, (separation of concerns), …

gotjoshua · January 6, 2024, 10:36am

I also don’t know about the per container cost issues, but in general I vastly prefer single docker compose files with separate services.

I think it may be possible and perhaps even reasonable to do an initial set of more microish docker images and set them up with well documented compose files, then later to create an all in one that basically glues them together via a multi stage docker build.

gotjoshua · January 6, 2024, 11:38am

Very grateful for the detailed issues you raised @Jorropo thanks…

gotjoshua · January 6, 2024, 11:42am

What to do about this?

@tennox and I have been experimenting with throwing car files full of ipld dag-json at various providers with very mixed results.

Anyone know of a more thorough review of how different services are offering “writable gateways” ?

boris · January 6, 2024, 3:43pm

Sure. That’s what Cloudron does. It’s the first request I’m going to make. It runs separate services.

I think I’m explaining myself badly and my “all in one” phrase is confusing people.

We need to design for deployment. What user types and what hosting services are there, and how do they expect to receive services for hosting?

And, how much do they cost? Can we make something that starts off in a “free tier”? Or be $5-$10 per month?

Here’s a list to get started of some services. I can gather the 1-click requirements.

Cloudron
Digital Ocean
CapRover
EasyPanel

PaaS Systems:

FlyIO
Railway
Render

Amazon AMIs could be on the list but my feeling is that designing a cluster solution for more professional / at scale hosting is a better fit for more complex systems like Amazon.

I’ll run some surveys and do research on what people want. There may be some programs to fund or reward submissions of new apps. I know that Railway did revenue share.

boris · January 6, 2024, 4:00pm

They don’t. I’d say this is one of the bigger actual research pieces. Most of the services just do http uploads other than Fission which does IPFS fetch of CIDs. There’s no sync solution.

Daniel’s FlyIO recipe “just” uses a proxy to give access to the regular Kubo API.

It’s not a solution for programmatic uploads at all, or for multi tenant.

We’ve got some CAR Mirror pieces (go plugin and some Rust client side work). Web3Storage have some UCAN tooling.

I think the tension here is that what I’m proposing as initial requirements is meant to be for single organization / user — just to get moving. Not multi tenant at all. I’ll think about this.

tennox · January 7, 2024, 2:04pm

Thanks for the initiative @boris !

I’m ready to throw some work at it as we were anyways planning to investigate how to self-host storage nodes for wovin.
A few thoughts:

1. deployment

I agree that docker-compose would be great.

we can just use existing containers (e.g. kubo), reduces maintenance and complexity
docker is meant that way (one process per container)
Easily enable parts you need - thinking that some of the parts you proposed would only concern a few users - but would be curious about your survey

The price-per-docker-container is new to me, but makes sense to then offer a combined container for the simplest use-case to have a low-cost starting point.

2. Kubo config

Thanks @Jorropo for the detailed thoughts about DHT, which is also one of the issues I’m most worried by when running kubo. I ran into “Falling behing reprovides” at a few thousand CIDs, and for our use case (IPLD DAG with many small blocks) this is easy to reach.
One option seems to be to set Reprovide strategy to e.g. “Roots” as most of the time clients would be asking for root CIDs, but I’m not sure what this means (will I still be able to retrieve child CIDs? In which cases (not)?).
Otherwise the AcceleratedDHTClient must be used which means “more” some resource & bandwith usage - but if I go by this example then it would be a few hundred megabytes of bandwith per 24h for 10k CIDs, which seems fine. Not sure about CPU&Memory - would need to test that because I can’t find info online. (@Jorropo or do you have better guidance on this?)

3. Gateway caching

A proxy (e.g. nginx) that caches gateway requests seems very straight-forward. Kubo gateway returns proper headers, e.g. Cache-Control: public, max-age=29030400, immutable

4. Upload / Writable GW / pinning

For our use-case, this is definitely part of the MVP. A in-browser client should be able to upload a CAR file using best a UCAN or some kind of old-school auth .
The golden path scenarios with & without pinning service are still WIP - so, our options:

Writable Gateway - removed from kubo in favor having to do a custom boxo implementation or waiting for a pending & WIP IPIP - so I don’t think useful for now.
Pinning API - mentioned above that the golden path stuff is WIP, but actually there is a library now, so not sure if that would actually work now. Testing needed.
Kubo RPC pinning - possible with some reverse-proxy auth, but better custom service in front so authentication wouldn’t also mean full admin access
(rust service that verifies a UCAN? )

5. _dnslink based hosting

This would require custom server/plugin as to my knowledge resolving TXT records of Host: is not built-in for nginx/caddy/traefik/…
Would be a fun experiment though (but is it something a self-hosting user would need? )

Topic		Replies	Views
Setup advice - use case	12	654	February 14, 2024
Newbie questions	35	749	June 10, 2022
Work-plans for kubo, helia, & other Shipyard IPFS projects in 2025 kubo , helia	12	339	December 12, 2024
Feasibility for Self-Hosting Scientific Datasets? Help go-ipfs , kubo	10	325	December 21, 2023
How can I disable DHT in kubo? Help dht , kubo	25	179	July 18, 2025