What motivates people to use IPFS for large volumes of data?

@flyingzumwalt : I think that’s roughly what I mean and I would expect that as IPFS matures that will provide significant value for organizations. When it comes to the question about how to best speak about the point, I don’t know what the best way happens to be.

2 Likes

Most companies store such large workloads on EMC Isilon or Netapp, who all have limitations on the four factors you listed above as to why use IPFS. I work on the sales side in storage but can say that almost all of my customers are looking to dump large archive workloads to AWS or Azure - this is always the low hanging fruit. So, archive use cases could be an interesting play especially in industries that generate PB’s of data like Media or Research

3 Likes

Just to add to the conversation:

I’ve just recently found out about IPFS and to me it seems like it can potentially be really positive for science reproducibility.

In my particular research community, large (up to around 10TB) binary files are generated through very time-consuming simulations. Storing them appropriately is a big deal (losing files means having to repeat simulations that can span several months). Sharing them with colleagues is of course also really important and is something that is not always doable in practice, unfortunately. For example, I can’t download simulation datasets of several Terabytes that are hosted at Stanford’s repository, since I am based in Europe, and would take me an absurdly long time to do so.

From what I’ve gathered in my short time reading about IPFS, the whole point is to increase file sharing speed through talking to your nearest neighbour in the network, and not necessarily a central repository. But I’ve also read that duplication is avoided, and that each node in the network stores only content it is ‘interested’ in. Therefore, in the case that I mentioned before, how would IPFS decide who stores these large datasets? Wouldn’t it be too costly to have them duplicated? If so, we would be back at the situation that I am now: downloading a huge dataset from across the globe is infeasible.

I’m interested in reading comments on this from more knowledgeable members of the IPFS community :slight_smile:

6 Likes

Hi, I work in a Web user behavior analysis company, you can compare to the Google Analysis. And the tracking code generates several TBs of data every day. And we store them in AWS S3 setting the expiration so that limit the total volumes to hundreds of Terabytes. We are seeking the ways to reduce duplication of data stored so that we can save money.
There are millions of sessions per day, that means we will have millions of ipfs nodes (short-lived, from seconds to tens of minutes) across the web once we deploy the js-ipfs on it. I believe that may release the most potential of IPFS.

OK, back to the point. Basically, we are watching and recording all the DOM changes happens on the page while the users are visiting the site so that we can restore the session in the future for analysis. Currently, we need the following things:

  1. The version control or The Tree Object mentioned in IPFS white paper 3.6.3. Right now we are using a diff algorithm to calculate the DOM changes. And store both the origin and the diffs into files. I believe if the IPFS Tree Object is guaranteed. We would reduce many duplications and save much space.
  2. Reliable push (or upload) method. I’ve tried PubSub for a demo, seems that the receive of the content is not guaranteed yet. Since the tab can be closed at any time. It’s very important for us to push the data to backend within microseconds. (Well, there may be some walkarounds.
    (I’ll add more when I come up with.)
3 Likes

@jeiros I think you’re pretty much correct in what you say. Maybe a few points for thought:

  • Depending on your workflow, it may be acceptable to retrieve only some of the dataset, e.g. for a given piece of analysis you only need to retrieve a subset of files from a given simulation. This is much easier with IPFS than some traditional data repositories. Also, if you add those files to your IPFS node, you automatically make it easier/faster for European colleagues to get those particular files. It sounds as if your data is a single binary file, though?
  • The intention is for there to be different importers for IPFS to optimize chunking of specific content types, e.g. video, HDF5 files? In theory this could help with de-duplication (i.e. de-duplicate content across multiple simulations) and streaming of content. I don’t know how this would apply in your case. https://github.com/ipfs/specs/tree/master/dex
  • Presumably the original data archive is duplicating or triplicating the data, i.e. through backup etc. A collaborative/co-operative approach to data archiving could meet these backup requirements while improving access requirements. The tricky thing here is governance, but there is good precedence with things like LOCKSS (https://www.lockss.org/). One could imagine a bi-lateral undertaking between the European Open Science Cloud and the US equivalent, or between collaborating centres in a given domain of science. So, you’d have sponsored/trusted nodes pinning the content much as data repositories do now, which is then supplemented by ephemeral nodes who temporarily pin content or pin content that is of interest to them (e.g. a research group pins a dataset it uses regularly; an institution pins content produced by its researchers, etc.). IPFS itself won’t help with the governance issues, but Filecoin might help incentivise third-party replication. Ultimately, though, archiving of scientific data is a public good, and different economics apply: https://www.biorxiv.org/content/biorxiv/early/2017/03/14/116756.full.pdf

Why GT Systems likes IPFS and is looking very seriously at using it. These comments should be read in conjunction with our initial community post here GT Systems: building Blust+SPA, the world’s first legal, approved, peer to peer movie sharing network; exploring IPFS, libp2p and Filecoin as a tool set

We developed our Secure Peer Assist (SPA) technology to overcome the issues with distributing video (movie) and other large files via the Internet. It is now approved by one of the biggest studios in Hollywood with more to come. We identified, very early on, the need for a file system. We were aware of some of the work around content addressing and new models for the Internet (although not specifically IPFS) but were very much aware of our limitations as a startup and felt these were outside our remit. So, we specified our own version and included its development in our budgets. It seems that IPFS has come along at the perfect time to meet that requirement. While it seems to be early days in its development, that is a good thing in that it enables us to contribute and influence its direction. We’re optimistic that recent developments in Filecoin and crypto-currencies in general will also help accelerate that significantly.

Why we like IPFS

  1. It fits our architecture, philosophy and values perfectly (assuming positive answers to our high level questions in the IPFS discussion forum here GT Systems IPFS and Filecoin questions)
  2. It supports hash’s and therefore content addressing and DHT
  3. It scales – BIG – hopefully to Exabytes and beyond
  4. It becomes more efficient as it scales
  5. It has no single point of failure and shards can continue to function
  6. It ISN’T BitTorrent, which makes it more acceptable to the studios (but, again, see our questions around security)
  7. It encrypts files at rest. Currently, we use PlayReady 3 to do that because it is acceptable to the studios. Hopefully, as we continue to work with the studios and introduce IPFS, we may be able to use the native IPFS encryption. That will depend on how secure it is and will require an extension of the journey we have been on for 10 years with the studios. But, right now, our architecture (including PR3) is approved. If we can make it work with IPFS, we are good to go with one of the best catalogues in the world, with more to come.
  8. Combined with Filecoin and our technology, IPFS provides the perfect mechanism for our customers to share movies. It fits our business plan and business model perfectly. Using our relationships, it overcomes ALL the issues (tech and business) of distributing movies via the Internet. This is based on a VERY deep understanding of the real tech and business issues and motivators, gained from working with all the Hollywood studios and Indies on digital distribution for 10 years.
  9. Given certain assumptions, we think we may be able to significantly reduce the cost of movies to consumers, while keeping rights owners (studios and indies) happy.
  10. It will allow us to come to market MUCH quicker and requires MUCH less funding.
  11. Between us, we can change the way movies and TV are distributed and sold for the foreseeable future and help fix the Internet. We like that very much.

Rhett Sampson
GT Systems
8 February 2018

7 Likes

I’m working on a side project for creating a database for learning materials, including large media files. Such a database could be pretty big, perhaps not dozens of terabytes but still sizeable. One thing I want for this database is for it the be decentralized where many people can pitch in to host it, and versioned using a graph of trust (like the linux kernel), rather than allow-edits-then-fix like Wikipedia. For that I’m developing a DVCS on top of IPFS.

2 Likes

Can the video website like bilibili youtube save their videos on the ipfs network at a lower cost?

  • Content Addressing
    If your data is important to you, it’s the only way to go.
  • Partition tolerance
    It is essential for us at actyx.io that all core features (IPFS, IPNS, pubsub) are partition tolerant. I hope this remains this way. E.g. attempts to use some kind of traditional blockchain for IPNS would be very unwelcome for us.
  • Small footprint
    We are running ipfs on industrial versions of raspberry pi, as well as industrial android tablets, so we require it to work on a small footprint. Of course it should also be able to take advantage of more capable machines.
  • NAT traversal and tolerance of obscure network topologies
    Ipfs currently does a reasonable effort to traverse NATs and work under less than ideal conditions WRT connectivity. It’s not perfect and could definitely be expanded on (e.g. dual NAT, android P2P wifi support). Ideally, if there is any way to get data from one device to another, IPFS should find this way.
  • Single device operation
    This currently does not work. E.g. it is not possible to publish to IPNS when a device is not connected to any peers, which is fairly common for mobile devices.
3 Likes

We are interested in lots of different kinds of large data volumes because we work in Video Production. One of the biggest problems we face in our everyday work environments is the crashing of hard-drives. Furthermore, direct collaboration during the creative process (even in the same room) is often challenged by poor uplinks and varied partition formats of drives. Versioning is very important in this process, and a final issue is the availability of finished media assets.

IPFS in its ideal state would help us to solve all of these problems.

1 Like

What I love about IPFS is that, as long as you have 2 nodes with the same data, you don’t need complex backup systems. One peer failed, okay, the same hash is in the other peer.

So simple, no need to have any backup management programs or redirections, not even a person watching out.

You can even have those 2 nodes in different buildings, even in different countries, they don’t even need to be interconnected or coordinated in any form to realize “one failed so the other must step upfront”. They don’t even have to know the other exists.

Your tech support can be busy with something else and not need to rush to restore some interconnected system between backups. The network’s got your back.

And when you get to repair your original computer, you can restore the whole backup network with a couple commands, in just a minute!

1: It makes backing up information (and keeping access to it) much more simple (and less stressing).
2: 0% downtime all year round for websites. No hosting company can guarantee you that nowadays, IPFS can.

On IPFS, only the apocalypse could bring down your sites lmao :joy:

1 Like

And if the apocalypse comes (barring a global EMP storm) with cluster peers you know that rebuilding the planet will be easier! :wink:

1 Like

LMAOOO hahahahahahaahh

Seriously, this is something that I have included in our whitepaper. What do people want to do if the world ends? Watch movies. This is one of the added benefits of including the sneakernet approach where we leverage the inherent mobility of our project stakeholders to maximize cluster saturation during moments of massive ingest that normally cripples consumer uplinks.We call it the SNEAKERBOX and it is a custom live UbuntuDistro on an external HDD with persistence AND an IPFS partition.

And before someone says: If the world ends there won’t be any electricity or internet, then I have to categorically disagree. The second thing I will build is a local network after I have a sustainable power supply for my community. Then we build mesh networks.

1 Like

now that you mention it, on IPFS it could be really easy for a country under a natural disaster (or even a war, where computers might be seized) to keep their informational infrastructure “up and running” 24/7 like nothing is happening.

This, along with some specialized DAPP, could help countries give medical assistance to huge quantities of refugees like nothing happent to its networks (education, healthcare…), as long as one node is in a safe place, the whole data of the country could be not only safe, but still usable like nothing happent.

The DAPP function would be just an interface for public servants to access to their usual programs/functions/data. and since the DAPP could be stored somewhere else, they’d just need a laptop to access the full functionality and paperwork of their profession.

This would, for example, allow to a doctor in a refugee camp in Africa access to their patients medical records, even if a war or natural disaster has destroyed the hospital where those were kept (which is unfortunately common in those situations).

The records could be safely stored outside the country even, allowing doctors and other public workers to work even in worst case scenarios.

This idea could have huge pottential for both governments, public organisms and NGOs, it would give a lot of resistance and agility to workers in disaster situations.

2 Likes

You could even make a “portable educational system” if you might. Just allowing the teacher in a refugee camp to access the class materials, exams, students IDs and grades… could help the teacher keep with his/her classes and continue a course like nothing is happening outside. Schooling children according to their courses, administering the official examinations and even uploading official qualifications and grades of each student to the “national education system”, even if the world is falling apart outside of the tent.

It would allow kids to continue their education, and give them a sense of normality.
Help doctors and nurses access medical records and following patients’ cases.
Track family members that have been separated, so they can reunite.

all it would need would be the DAPP, internet connection and the patient remembering some official ID number…

But I think I’m deviating the conversation, so I will stop here. But hey, this thing has lots of pottential for emergency situations…

1 Like

I think it is really important to efficiently recover from disasters, and I know that medical assistance and education are priorities in crises; but I also firmly believe that it is important to offer “cultural relief” too. Idle hands get in trouble, especially if there isn’t anything else to do / if one can’t leave the refugee center. Being creative and enjoying the fruits of cultural endeavor are what make us human. Medicine just keeps us alive and education, well I won’t even go there…

So, in my opinion, cultural activity is super important to restore as soon as possible. Fix what you can and then make more short films. :smiley:

2 Likes

Optimally there’s zero practical distance between these “educational” and “cultural” informational use cases. One example is our past partner I mentioned above Internet-In-A-Box, which enables offline or local-only mesh access to compressed caches of free Internet resources. One of the “educational” resources included is a large collection of Khan Academy videos. There are other online video based courses, and I know of many people who watch them for fun, just like you might joyfully partake in any other nonfiction media or documentaries.

Preferably these compressed offline-capable caches would utilize some kind of binary DVCS system for periodic updates and synchronization, and allow individuals and groups to decide which resources they personally find important for both offline and emergency use cases. This will create a natural redundancy and prioritization in aggregate for disaster scenarios, which exceeds the resource choice optimization that any one centralized institution can synthetically provide.

2 Likes

Got two factors when evaluating!
Cost: Open-Source project, can leverage a multitude of infrastructures, OS agonistic
Support: Who do you call when someone wins the lottery!?

Even though these factors have been loosely mentioned…

I’ve been using it to host videos that youtube deletes because of censorship. I just want to watch Alex Jones while coding or gaming, is that too much to ask? Its great that all these programmers are out there trying to save the world and make the world such a better place! I feel like that never happens though. I feel like all of our ideas get co-opted by the governments and the powers that be or by the sjw nazis and get turned into our own chains of enslavement. I use ipfs because its shows legit promise at defeating the evil fucks that run the internet. Bittorrent has yet to be stopped, the tech behind github aint going anywhere, ipfs becoming the new internet is inevitable. I use ipfs because Im sick of censorship, sick of people telling me what to think, and sick of people trying to control everyone’s lives.