Real-world use of IPFS for scientific data?

As a research software engineer, I build systems that manage the storage of and access to scientific data. Over the years I have read plenty of statements about how IPFS could be a solution to many of the needs of scientific data storage and sharing, and I am interested in experimenting with it to see what it could do for my own research collaborations. However, the discussions on this forum related to this kind of use case are about five years old, and I am not finding concrete, real-world examples of scientists using IPFS to store and distribute their data. If anyone has such examples to share I would appreciate it.

1 Like

Hi @manning-ncsa, are you interested in a specific scientific domain? We have this example GitHub - eurec4a/eurec4a-intake: Intake catalogue for EUREC4A field campaign datasets and another example from Bacalhau.org case studies page: BacalhauGitHub - wesfloyd/how_to_eurec4a: Code examples to get you started with EUREC4A data.

check out docs.bacalhau.org - this is new compute over data project, basically you just send your compute to the where data lives! its popular among researches. let me know if you have any questions:)

2 Likes

Hey @manning-ncsa ,

This is a great question. Indeed, many of the threads on this are old and the IPFS network evolved significantly in the last couple of years.

At our most recent IPFS gathering, IPFS Camp we had a track on Decentralised Science. You can check out the recordings here.

I’d like to also point out that there’s a interesting paper about the potential of IPFS and content addressing for scientific data.

If you’re interesting in experimenting with IPFS with large datasets, I’d recommend checking out https://estuary.tech/ where you can store public data and guarantee it is available to everyone around the world via IPFS.

1 Like

I work on the estuary team and second everything Daniel said.

Estuary is currently working with https://opsci.io/ to onboard their data. Separately, we recently also started working with the cancer imaging archive to onboard their data as well. Additionally the bacalhau project that works on compute over data also uses estuary as a backend.

A bit more about what estuary does – you can think about it as an ipfs pinning service that also persists the data to filecoin with 6x replication.

Happy to chat more if interested!

3 Likes

Thanks for your encouraging responses and all the links @it09 @danieln @anjor . (Somehow I did not receive notification emails for your responses and so did not see them until now :confused: ) My current projects are primarily nuclear and astro physics, where the astronomy surveys typically involve large data sets that require both public and private access. I am still interested in exploring IPFS for use in this domain. Without existing real-world examples to reference, it is challenging to promote IPFS in designs when writing proposals for new projects, which in turn makes it difficult to spend time experimenting with it and developing tooling for the scientific community. A kind of a chicken-and-egg problem.

I will follow your links and come back with more questions.