boris
January 7, 2024, 4:45pm
21
The custom domains is probably the least self-hosty, you’re right.
A user would have at least one domain and create mygatewayaddress.com
. Turning on subdomain support means that lots of sites could look like bafy1234.mygatewayaddress.com
.
I didn’t think the _DNSLink stuff was hard but I was also thinking that Cloudron (for example) has let’s encrypt cert generation and would be done outside this app.
I think host gets passed to Kubo and Kubo looks it up, and nginx doesn’t have to do anything?
@walkah can you describe how this works?
1 Like
danieln
January 8, 2024, 10:09am
22
I think fly.io ’s pricing has changed since. Based on my experience, Kubo needed a minimum of 512MB of RAM So it was around $10/month.(Fly.io Resource Pricing · Fly Docs )
1 Like
danieln
January 8, 2024, 10:19am
23
Have a look at this issue and feel free to share any thoughts:
opened 09:34PM - 11 Mar 22 UTC
P2
need/analysis
IPIP
## Problem statement
HTTP Gateways are the most successful way for retrieving… content-addressed data. Successful use of HTTP for retrieval use cases proves that IPFS does not replace HTTP, but augment it by providing variability and resiliency. IPFS over HTTP brings more value than the sum of its parts.
Removing the need for implementation specific RPC APIs (like one in Kubo) allowed not only faster adoption of CIDs on the web, but enabled alternative implementations of IPFS (like Iroh in Rust) to test compliance and benchmark thenselves against each other.
While we have HTTP Gateways as a standard HTTP-based answer to the retrieval of data stored with IPFS (including verifiable [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) and [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) responses), the data onboarding over HTTP is currently done with vendor-specific APIs.
The status quo at 2023 Q1 is pretty bad from the end user/developer’s perspective: every IPFS implementation, including online services providing storage and pinning services, exposes custom opinionated HTTP API for onboarding data to IPFS.
## Why we need IPIP for HTTP Data Onboarding
To illustrate, some prominent examples (2022 Q4):
<details>
<summary>Click to expand :see_no_evil: </summary>
- Implementations
- Kubo RPC (AKA legacy /api/v0/..)
- Is often used as a “standard HTTP API upload template” because it has commands for all onboarding needs:
- [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-add](https://web.archive.org/web/20221201011916/https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-add) – files and directories
- FLAG: it uses custom form-data handling that requires special library for directory upload, which is an awful papercut for someone expecting simple upload with “curl” ([http://web.archive.org/web/20221201011916/https://docs.ipfs.tech/reference/kubo/rpc/#request-body](http://web.archive.org/web/20221201011916/https://docs.ipfs.tech/reference/kubo/rpc/#request-body))
- FLAG: Kubo RPC was never designed to be used in browser context, and there are known bugs around the way it handles uploads (example: [https://github.com/ipfs/kubo/issues/5168](https://github.com/ipfs/kubo/issues/5168))
- [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-block-put](https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-block-put) – raw block
- [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-put](https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-put) – JSON-like documents and custom DAGs (DAG-JSON and DAG-CBOR)
- [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-import](https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-import) – arbitrary bags of blocks in CAR format
- JS-IPFS
- Reimplements most of the Kubo RPC and exposes it over HTTP,, but diverged long time ago and is not 1:1
- FLAG: In addition to HTTP, JS-IPFS exposes selected commands over gRPC-over-WebSockets, to work-around browser issues caused by Kubo RPC ([https://web.archive.org/web/20220528152743/https://github.com/ipfs/js-ipfs/tree/master/packages/ipfs-grpc-server#why](https://web.archive.org/web/20220528152743/https://github.com/ipfs/js-ipfs/tree/master/packages/ipfs-grpc-server#why))
- IPFS Cluster
- Acts as a reverse proxy for Kubo RPC, but has own commands too and provides special behavior on top of what Kubo RPC does:
- [https://web.archive.org/web/20220911053755/https://ipfscluster.io/documentation/reference/api/](http://web.archive.org/web/20220911053755/https://ipfscluster.io/documentation/reference/api/) – `/add` endpoint uses unixfs by default, but also accepts CARs when HTTP POST request is made with `?format=car` and it only accepts CARs with single root.
- Online services
- Pinata
- [https://web.archive.org/web/20220930091452/https://docs.pinata.cloud/pinata-api/pinning/pin-file-or-directory](https://web.archive.org/web/20220930091452/https://docs.pinata.cloud/pinata-api/pinning/pin-file-or-directory) – onboarding file or directory
- [https://web.archive.org/web/20220817122725/https://docs.pinata.cloud/pinata-api/pinning/pin-json](https://web.archive.org/web/20220817122725/https://docs.pinata.cloud/pinata-api/pinning/pin-json) – onboarding JSON document
- web3storage
- [http://web.archive.org/web/20220914153854/https://web3.storage/docs/reference/http-api/](http://web.archive.org/web/20220914153854/https://web3.storage/docs/reference/http-api/) – file and CAR uploads
- note: no block API (impossible to import DAG-CBOR without the overhead of single-block-CAR for every CID)
- Infura
- [http://web.archive.org/web/20220429202905/https://docs.infura.io/infura/networks/ipfs/http-api-methods/add](http://web.archive.org/web/20220429202905/https://docs.infura.io/infura/networks/ipfs/http-api-methods/add) – file and directory import API that is carbon-copy of Kubo’s internal RPC API
- [http://web.archive.org/web/20220429203039/https://docs.infura.io/infura/networks/ipfs/http-api-methods/block_put](http://web.archive.org/web/20220429203039/https://docs.infura.io/infura/networks/ipfs/http-api-methods/block_put) – raw block import that is carbon-copy of Kubo’s internal RPC API
- note: no CAR import
- TODO: source more examples
</details>
This state of things introduces an artificial barrier to adoption: the user needs to learn what APIs are available, and then “pick winners” – decide which implementations and services are the most future-proof. And even then, many choices are burdened by legacy of Kubo RPC and it’s degraded performance and DX/UX in web browsers.
## Goal: create data onboarding protocol for both HTTP and native IPFS
The intention here is to create IPIP with a vendor-agnostic protocol for onboarding data that:
- is easy to use and implement in HTTP (`POST https://`)
- does not require any libraries or documentation,
- and is as easy to work with from JS with `fetch` API as it is in the command-line with `curl`
- follow the retrieval story, where `ipfs://` behavior is analogous to subdomain gateways
- :point_right: what we want, is to have a protocol that can be represented as both `POST https://` AND `POST ipfs://` APIs
## IPIP scope
We want two IPIPs: one for onboarding data with HTTP POST, and one for authoring (modifying/pathing) it with HTTP PUT.
This allows us to ship most useful onboarding first, and then do authoring as an optional add-on, which services may support, but dont have to (if they are only onboarding to filecoin etc).
For now, focusing on the POST
### POST Requests (Onboarding)
> 👉 This is the minimal scope we need to cover from the day one, ensuring every use case has a vendor-agnostic spec.
- **Delegated**
- Single File (UnixFS) or single (DAG-)CBOR/JSON document
- Arbitrary Directory tree (UnixFS)
- Option A: TAR stream
- open question: how does this handle interrupted upload? can server tell some data is missing?
- Option B: custom form-data? (think twice, we have lessons learned around RPC at `/api/v0/add` in Kubo)
- **Native**
- Raw block
- CAR stream
The working code for this will be reference implementation that replaces/updates the legacy [`Gateway.Writable` feature in Kubo](https://github.com/ipfs/kubo/blob/master/docs/config.md#gatewaywritable) with the above feature set.
### PUT/PATCH/DELETE Requests (Authoring)
This will be a separate IPIP, but flagging this as long term plans that should feel idiomatic too.
- TBD: **Delegated** vs **Native**
- Critical: ensure no surprises, UX/DX is paramount. Needs research and analysis.
- One idea is to keep it limited to patching UnixFS paths and DAG-JSON/CBOR documents.
- Other idea is to have syntax parity with JSON-based [IPLD Path](https://ipld.io/specs/patch/) and have the same JSON syntax as [`dag diff`](https://github.com/ipfs/kubo/issues/4801) and [`dag patch`](https://github.com/ipfs/kubo/issues/4782) commands.
## References
- Revisit the [concept of Writable Gateways](https://discuss.ipfs.io/t/writeable-http-gateways/210?u=lidel)
- https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#gatewaywritable
- https://discuss.ipfs.io/t/writeable-http-gateways/210
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Location#pointing_to_a_new_document_http_201_created
- WIP private IPIP draft: https://www.notion.so/protocollabs/wip-IPIP-Data-Onboarding-with-HTTP-POST-4c394b8ebb774f2d87d34466019257fc
- Alex prototyped some REST APIs in https://github.com/ipfs/specs/pull/224/files (while this was intending to be update to Kubo RPC, the document includes some ideas around patching files and directories)
danieln
January 8, 2024, 10:23am
24
boris:
When I compared some of these services a while back to host IPFS stuff, they didn’t support hosting UDP services which is needed for QUIC, WebTransport, and WebRTC.
This may have changed, but it’s worth considering depending on the use-case.
1 Like
danieln
January 8, 2024, 10:40am
25
tennox:
2. Kubo config
Thanks @Jorropo for the detailed thoughts about DHT, which is also one of the issues I’m most worried by when running kubo. I ran into “Falling behing reprovides” at a few thousand CIDs, and for our use case (IPLD DAG with many small blocks) this is easy to reach.
One option seems to be to set Reprovide strategy to e.g. “Roots” as most of the time clients would be asking for root CIDs, but I’m not sure what this means (will I still be able to retrieve child CIDs? In which cases (not)?).
Otherwise the AcceleratedDHTClient
must be used which means “more” some resource & bandwith usage - but if I go by this example then it would be a few hundred megabytes of bandwith per 24h for 10k CIDs, which seems fine. Not sure about CPU&Memory - would need to test that because I can’t find info online. (@Jorropo or do you have better guidance on this?)
This is a great question and a topic with a lot of nuance.
@tennox To keep this on topic, would you mind opening a new topic for this?
2 Likes
tennox
January 8, 2024, 3:13pm
26
=> Moved to here
Repo
I set up a basic git repo (RFC). (if anyone wants write access, tell me )
Nginx caching
Set up a basic test on a branch , but it seems to me there are no performance gains (hurray, kubo gateway - you are as efficient as an nginx cache). Further testing / test cases could change that (or I misconfigured something )
1 Like
walkah
January 8, 2024, 4:45pm
27
boris:
I think host gets passed to Kubo and Kubo looks it up, and nginx doesn’t have to do anything?
@walkah can you describe how this works?
yeah, afaik there’s nothing required at the nginx/proxy layer - kubo itself will resolve DNSLinks (I always assumed based on Host: header but I guess I’ve never actually looked at that code).
1 Like
tennox
January 9, 2024, 12:01pm
28
Yep, correct. Somehow I only thought about static hosting & nginx, and didn’t think of kubo Nevermind then
Ucan Store Proxy
Started a PoC for a service that checks if a UCAN if it authorizes to store/pin, and forwards that request to the Kubo RPC API (and later maybe other pinning services ).
Does that sound like a decent strategy?
2 Likes
tennox
January 11, 2024, 2:04am
29
3 Likes
boris
January 11, 2024, 2:15am
30
Woo hoo!
Y’all should create a Project in the Wovin OpenCollective so we can support this.
And supporter votes on features should be taken into consideration
2 Likes
rather here? or on the issues in gitlab?
tennox
January 12, 2024, 12:40pm
34
At your discretion
I guess for the general concept & topic here, and for specific features in the issues.
Also not sure if part of this discussion would better belong in a separate thread. (anyone feeling spammed? )
New repos & Opencollective
Consolidated repos to new org:
And we have set up our opencollective sub-project
in case anyone want’s to help make it happen on a financial level
Option 2 : Use gateway through ucan proxy (bit finnicky, but might make some use-cases possible)
I thought: as we probably need a reverse proxy for Letsencrypt SSL anyways - we could add a forward_auth middleware in front of kubo .
But I’m not sure how different self-hosting situations would look and if that would make sense… @boris any idea on that?
1 Like
boris
January 12, 2024, 2:16pm
35
I’d love to support this financially. There are a couple of different targets.
Maybe @danieln or someone else could take a look at getting this running on FlyIO and see what it needs — those systems typically run their own nginx networking layer and then reverse proxy to the app that’s running.
Cloudron is the platform is one of the self hosting platforms I’d love to see this running on. It’s got some specific requirements:
3 Likes
danieln
February 16, 2024, 11:24am
36
I just came across this guide: Enhancing IPFS Performance in Kubernetes Environments :: Terminal Thoughts which I thought might be relevant for this thread
2 Likes
boris
March 19, 2024, 2:54pm
38
Thanks for sharing! Had a quick look and these are mostly packaged as Docker Compose. Do you have any thoughts on other deployment config best practices?
I have deployed on rpi k3s, k8s, digitalocean, azure, aws, and gke. apologies for the late reply. let me know what u help with …
boris
May 4, 2024, 1:54am
40
Less about the tools — more about config and setup of Kubo and other components. Nginx, caching settings, how badbits is configured, how it’s setup to use / auth APIs etc