What's the best way to handle uploads to IPFS from a web app?

Helia does not actually commit to IPFS so it’s not an option. It (by default) stores in memory:

The datastore can be configured with ones from js-stores, however none of them actually store on the IPFS network: GitHub - ipfs/js-stores: TypeScript interfaces used by IPFS internals.

Helia 201 - storage also has no mentions of actually storing anything on the IPFS network: helia-examples/examples/helia-101/201-storage.js at main · ipfs-examples/helia-examples · GitHub

The question arises: how to actually upload the content? Is it exposing (only /add) a custom Kubo node that the app pins to? If so, then having one server for pins makes us centralized and can be a bottleneck. How should that be dealt with? As far as I can tell, the files are distributed to the other nodes on the network when they’re queried (first request for CID uploaded to own Kubo instance takes an eternity to respond).

A couple of options:

Generally speaking, once other IPFS peers in the network can retrieve a CID, the data can get replicated by multiple peers.

That, as I mentioned, provides a single choke point for incoming traffic. It also means pins are centralized unless someone queries the CID from another node (which I don’t expect any of our users doing). Is there a solution for this?

There’s another question regarding this: if we have staging/production environments for our app, does it make sense to have two Kubo instances for each environment?

I’m not quite sure I follow you here. Do you mean using Helia on a server as our own node (similar to Kubo)? This can also have the same problem of centralization: there’s one (our own) server that holds the data and pinning.

At this point, how is IPFS any more decentralized than AWS S3 or GCP Cloud Stoage? There’s one server that handles the uploads and the only practical way of retrieving data is using the gateway of that server. We don’t know when (or even if) some other node might have the file. As far as I can see (please correct me if I’m wrong), if it’s not queried, no other node will have the file

Thanks for that suggestion. I’ll explore those. Do you know how propagation happens with these services? Is it the same as running own Kubo node (the content is available on one node, unless someone asks another node for it)?

Indeed. The meaning of decentralisation is context dependent and can be best thought of as a spectrum.

For example, would you consider the pins sufficiently decentralised if you pinned them to 3 IPFS nodes that you control? If so, you may want to look into IPFS Cluster which helps you coordinate pin sets across multiple Kubo nodes.

If you want others to pin the CIDs, you could rely on pinning services to pin the CIDs once uploaded. However, for the pinning services to keep pinning they rely on you paying for their services. Would that make it sufficiently decentralised?

Then of course, there are crypto native approaches to this, that include Saturn, but I don’t have enough experience with those to say how well they work.

Exactly. That was just an alternative suggestion to Kubo.

IPFS enshrines the ability for other peers to replicate CIDs and become providers for those CIDs. However, you are right about the choke point problem of your Kubo node.

That’s not entirely correct. Part of the magic of IPFS is the DHT, which allows other IPFS nodes to find all the nodes that provide a given CID and retrieve it over Bitswap. In other words, data can be fetched from your Kubo node using either the gateway or bitswap.

That’s correct. If you add a file to your Kubo node, no other peer in the network will “magically” replicate it.

Depends on your use-case and amount of data, but generally it’s a good idea to keep environments separate.
If you decide to share a node across envs, you probably keep track of CIDs in your app so that you can routinely and safely unpin CIDs associated with your staging environment (assuming that unpinning won’t break your staging environment).

A better thing to do is use a decetralized Pinning service like Crust Network. Perks of using Crust Network is pinning is cheap (as of June 2024, price is $0.00014/GB/month) and more decentralized than a centralized pinning service, a file pinned on Crust Network is replicated on average of ~33 nodes (as of June 2024). (more data at Crust Subscan)

Crust is very decentralized in nature, and IPFS is at it’s core. To pin a file on Crust Network, you first need to make your file available on the IPFS network. Any publicly accessible IPFS hash can be pinned on Crust Network.

So, to answer your initial question.
You need a centralized solution (Host your own Kubo node and expose it’s API or use any services which would enable you to receive file from the cilent and make it available on the IPFS network), but this centralized solution is just a temporary storage for your files.
Once you receive the file from your web app to the your own IPFS Node and it is accessible publicly, then you can place a storage order on Crust Network (docs).

And thus you will have a more decentralized and long term sustainable IPFS storage.

Saturn is only a retrieval network. Saturn provides services for retrieving a file that is already available on IPFS network or Filecoin. It could be thought of as a decentralized gateway or more preciously a decentralized CDN that works on top of IPFS and Filecoin.

To make that magic happen you just have say 2 magic words, “Crust Network” :magic_wand::sparkles::sweat_smile: