IPFS in production - challenges, best practices, and use cases

I’m curious what production services people are running using IPFS.

What challenges did you encounter on your way to a production deployment? What best practices would you recommend to others? What other types of production services would you consider using IPFS for?

At Nori, we need to store and serve public data about carbon removal projects that can be linked to from the Ethereum blockchain. We are currently evaluating how IPFS could help us do that.

Currently, I’m thinking that we would:

  1. privately store this data in a centralized queryable database like mysql/mongodb, and
  2. mirror the same data to IPFS by pinning it in a cluster of public IPFS nodes that we host

Does this sound reasonable? Are others already doing this? Is this the right way to think about IPFS?


Three questions might be helpful to keep in mind:

  1. Do you need a blockchain? A Database?
    Especially given the context of the data in question. Given the briefing of your project, I’m sure you & your team have good reason to be thinking about blockchains, but it’s worth noting that there are ways to do digital provenance & accountability without using blockchains, so long as you don’t have the double spending problem. I may be wrong, but a Database, Blockchain, and IPFS all sound like three potentially competing sources of truth that would need to be carefully managed to not make grinding noises. On that note…

  2. Is content-addressing suitable as your ground truth?
    IPFS plants deep roots in content-addressing, which is great, but doesn’t play nicely with more than one permutation of a given unit of information. If you’re storing data in a database that must interoperate with IPFS, a great deal of time (and tests) will go into preserving the exact byte order of any data that is mirrored between databases and on content-addressed systems. Another common pattern you’ll see a lot is to use IPFS as the store of data, and keep hashes of relevant information in a database for fast lookup & retrieval. I think it’s best to take the time to see how well your problem is suited to content addressing (from what I can glean, it seems like a solid fit!).

  3. Are you prepared to invest the time to grow with a new technology?
    Ethereum + IPFS means you have at least two bleeding edge tech stacks to work with, it’s worth knowing in advance that a lot of time will go into keeping up with the Jonses’. Well worth it IMHO, but it’s worth keeping in mind that limiting the number of dependencies under rapid change might make life easier, if a little less exciting :slight_smile:

The “production” side of it is less concerning in my opinion. We’ve had lots of fun deploying IPFS into K8s and keeping all sorts of data in there. IPFS works today, and is getting better all the time (IPNS on the other hand, not so much). But so long as the amount of data you’re storing can be broken up into chunks less than 1 Gig, you’re in well covered territory.

Hope that helps!

Thanks. This is helpful. Some thoughts and answers that come up after reading your response:

Do you need a blockchain? A Database?

We are “betting the farm” on proof-of-stake solving the energy problem in the near future. If that doesn’t happen, we will probably try bundling off-chain transactions, but that does indeed sound painful. Our long-term goal with blockchain is to remove ourselves (and everybody else for that matter) as gatekeepers to participation in carbon removal trading. Our thesis is that market forces will drive down costs and spur increased investment in carbon removal, and using blockchain as a global trading platform will inspire greater trust in the market.

Is content-addressing suitable as your ground truth?

I think large swaths of our data, thought not all of it, are appropriate for content addressing. Specifically, we are creating a commodity token called a “Carbon Removal Credit” (CRC) which represents a certain about of carbon having been removed. The data collected during the removal process serves as proof of removal and so each CRC will be linked to the data that proves it’s legit. Storing all that data in the Ethereum blockchain is cost prohibitive, so IPFS and eventually FileCoin, become interesting mechanisms for decentralizing access to that data. This actually brings up another implementation detail question: what is the most storage efficient way to link to data in IPFS from a blockchain?

Are you prepared to invest the time to grow with a new technology?

This is actually my biggest concern. Once our financing situation is taken care of, investing in IPFS becomes a lot more compelling. But while we are trying to keep lean, I’m reluctant to spend cycles on tech that is still baking. Part of my motivation for asking the original question was to suss out just how many cycles we might need to spend :slight_smile: