What is IPFS actually good for?

Hello all.

i really, really appreciate the existence of such a well-supported project as IPFS

however, i would like to ask something regarding the usability of the platform itself

Say, hypothetically, we want to store good amount of information in a decentralized manner

and that information will be consistently updated and frequently accessed / referred to

is IPFS a good choice for such a use case ?

or am i better off using something else ?

and then what is the most optimal usage of IPFS ? what kind of use case would best suit IPFS ?

i am open to all types of suggestions and opinions

would be very thankful for any contribution !

We literally talk about this a lot on this forum, everyone is talking about the “dynamic website problem” when IPFS uses content based ID making there no obvious way to check for whether something has been updated when you don’t know what that thing is.

We had been building things like ipns, pubsub / gossipsub to work with DHT together solving the problem or trying too but these solutions to me are large and chunky while there is a good way to check for whether if something exist in regular client-server environment is that you just ask for it.

The normal IPFS use case would be :Storing something inside the network (by hosting it yourself or on the server) → get it back using CID from anywhere (it’s usually cached on people’s IPFS after they request it)

IPFS has many unique properties.

  • Decentralized
  • Immutable data
  • Transport agnostic

The down side to immutability is that updating data is done in a different way. Any transport can be used but only a few are implemented and tested. As @Xyncgas said browser support is not good.

Tell us more about your project, what are you building?

Hey,

I can’t divulge too much as this is supposed to be an academic publication

but i can let you know that i was hoping to use IPFS as a database , something that can be updated and queried at will at high speed / quick frequency

but going by what you and @Xyncgas have described, that’s not how IPFS works or is purpose built for ?

Sorry that I can’t help you more but it just depend on your project.

If your trying to crowd-source data on a planetary scale using IPFS could be the only way.

If you just need a db then use one?

Also, keep in mind, the only thing you can query on IPFS is CIDs.

i would but we don’t want to risk any type of centralization

i would daresay, we are looking for a data retention scheme which religiously abides by decentralization

@Yaxir

I don’t know if you’ve seen this paper about Hadoop on IPFS? Maybe it helps.

Don’t forget to report back when it’s out :slight_smile:

That’s technically true, but yet you can built great things with small blocks.

@Yaxir I will list the projects built on IPFS I know loosely related to data management. As I’m not really in the field myself, a lot will be irrelevant, but some may help.

Now discontinued:

There may be more.

Depending on your model (does every peer want to edit and access the same piece of data
? Is eventual consistency enough? etc?) , you may want to use gossipsub to have a topic per data that interested peers want to follow, to get updates quickly and have real-time updates. CRDT’s also come to mind.

Your design will also depend on the frequency of updates.

Information being accessed and referred to is not a problem with IPFS. You can access the CID for a static version, or the CID pointed to by an IPNS record for an updatable version. See qri.io above to see how they approach version management of datasets.
The “consistently updated” part will depend on whether the updates are frequent, how dynamic is the group of peers having the right to update, do all the pieces of data have the same group of peers able to update them, and is peer accessing a non-super-fresh data ok (eventual consistency).
It’s hard to help you further without knowing a bit more.

Good luck!

2 Likes

“information will be consistently updated and frequently accessed / referred to”

it’s fine you can update anything using gossipsub, it’s part of ipfs protocols

Then yes IPFS is your best bet.

Also, take a look at holium it might help.

If you wanted to query data on IPFS you would need to know the hashes of all the things you needed. The thing is, everything that you post to IPFS, in my understanding, isn’t picked up automatically by others, you would have to share it with them so they can hold onto it too. So everything you post to IPFS is just stored on your machine technically? (Someone pls correct me if I am wrong)

If you want to share data to others that are interested in storing, you can use the pubsub feature to publish to a topic of your choosing, and share that data very quickly to others subscribed to that topic.

If its of interest to you I made something that kind of helped with this problem a while back. It basically kept track of and stored ipfs hashes published on a pubsub topic in the background. And then it added the hashes and extracted the content to a local MongoDB, so you can query data in a familiar way True Citizen Science. My goal was to collect and analyze IPFS data in a familiar way. So I focused on the citizen science model and how to decentralize it. Here is a video I did on how it works video.

DHT tells others about you having something

I am working on a similar project in Postgres. I will need to rewrite some things I think but I am sure it is faesible.

As of right now, I don’t know if it is the best choice but what I can tell you is you will need a lot of space, and I mean a lot.

ipfs-cluster, maybe just on the primary database, and use ipfs-cluster-ctl add <file> --expire-in 99H and run the garbage collector ipfs-cluster-ctl ipfs gc frequently might be a good choice but what about some data file that has never been changed like the system database files ? The files would need to stay static and the use of IPNS might work. This means whenever a system database file is edited by the SGBD, it would have to pin the file and publish the new hash in association with the IPNS key.

And then when you finally added the file, you can unpin it and cron the garbage collector everyday.

But there is something I think we can’t use with IPFS.

1 - When you run some queries on your SGBD sometimes you need some temporary files for the query to compute. They need to be really fast, and since they are temporary I don’t think I would add them to an IPFS node.

2 - Assume you edit a tuple, this tuple is stored in a datafile. Once the file is edited, you will add it to your node, unpin the last version of the file (or not, it depends on your use case, maybe for going back in time scenarios). For a time you will have the two versions of the file. What happens when you have 4 GB of data modified, 400 in the same day? What if you run a IPFS cluster ?

3 - I think the performance will be a very big problem. First on the standbies (if they have not pinned the file) will need to download it from the primary node, and it might take some times to find it. Unless you are using a cluster and this might not be a problem, but not only you will need to unpin the edited file on the primary but you will need to unpin it on the standbies also. And you will need to run the garbage collector on all your nodes and have some system for letting know your nodes what files to unpin. And the second is on the primary, what if someone access different tuples but are stored in the same file data ? You will need to add some type of locking mechanism, and this is for me the hardest part.

Transaction Logs and database backups won’t be a problem.