I’d probably add:
- maturity and stability of specs
- degree of adoption and integration (e.g. with browsers and operating systems)
I’d probably add:
I would very much like to start using IPFS but I am nervous of doing so as I am on a metered WiFi connection in the middle of the Southern Highlands of Scotland, where the phone company buried shit aluminium phone cable 50 years ago and I am so far from the switch that the impedance kills any chance of usable bandwidth. But its not all bad, at least I get 28Mbs down and 40Mbs up, with my 4G connection. Its just so expensive and of course its not built for volume.
Matt - IPFS has huge potential.
I just went through a long and productive meeting with Jeremy from the Go team going over some of the issues that hold up making large amounts of data available from archive.org’s perspective, I don’t know if he went over all the points but I’ll outline them here, and we can cover on Friday if your team will be around long enough and there aren’t too many other items on agenda.
In brief in our test architecture, we are putting the data in on the server, we’d like to do it directly, but currently it goes through a local IPFS (Go) instance. We are then trying to access on the browser.
We are assuming that many of these issues will eventually get solved, till then its hard to add large (petabytes) of data to IPFS.
Architecture
Server side:
Browser/Javascript issues;
You don’t have code complexity in your list. I would guess that the potential of only writing one software stack for a client and not having to write one software for a client and the other for the server would be an advantage.
Thank you to everyone who has posted so far in this thread. The discussion is bringing up even more good points than I expected. I hope I can funnel it all into the next round of UX planning and implementation. As conversation here slows down I’m gathering the info and I’ll try to post a digested rundown of observations people have offered.
It looks like @mitra’s post might have squashed the energy of contribution and dialogue that flared up in this thread. @mitra, I’m trying to hear people’s insights about motivations for putting large volumes of data on IPFS. I’m seeking feedback that’s speculative and forward-looking. We will use this info to ensure that we prioritize efforts around features, documentation, performance, etc. correctly. By contrast, your post is more like a report of the things you have found frustrating with the current implementation of IPFS. While it’s fine to discuss those things on the IPFS forums and the feedback is relevant for our efforts to support large volumes on IPFS, it doesn’t fit in this particular thread. You’ve put me in an odd position because many of the conclusions you offer are either inaccurate, misleading in the way they’re worded, or downright wrong. I want to address that misinformation but that that would derail the focus of this thread, which is producing extremely useful and informative discussion.
Reading between the lines of @mitra’s post, I see some motivations and important features to note:
@ChristianKl one way I tend to think of this is that it allows us to switch to thinking of everything as nodes, services and workers in a broad system, where the location is incidental and changeable based on needs – for example it allows you to blur the distinction between server-side and client-side analysis. Instead of forcing a dichotomy of server-side vs client-side, it lets you think in terms of performing analysis on a device that’s close to the data, on a device that’s further away, or to replicate the data to a new location and analyze it there. In a way this simplifies your code base because it lets you write little libraries and services that can be reused in client applications, workers, etc. regardless of where they’re run.
Do you think that is a good way to talk about the point you’re making about code complexity, or does it confuse things?
Between the http gateways, which give you backwards-compatibility with www-based apps, and the emphasis on making the command line interfaces conform to unix and posix conventions, I tend to think that we have a high level of support for this kind of interoperability. Can you think of other ways people would want to integrate with something like IPFS, which operates at the data persistence layer?
Can you give an example of their alternatives? What kind of existing interfaces would they already have familiarity with?
All good points Matt, and certainly don’t want to shut down dialogue or derail the thread, but you did ask "the factors that make IPFS more or less appealing for people who are dealing with data on that scale” and these are all factors that have slowed down our attempts to put data and the apps that use it onto IPFS - something we’d like very much to do.
I don’t want to detail the thread, so lets take detailed discussion offline, note I already emailed you to try and get some technical time on Friday prior to our broader meeting but didn’t hear back - I’ll resend and if that time isn’t available, then I’d love to know (email off-thread is fine) anything I’ve got “downright wrong” above, one challenge has been that its been hard to get technical engagement to address these issues.
Between the http gateways, which give you backwards-compatibility with www-based apps, and the emphasis on making the command line interfaces conform to unix and posix conventions, I tend to think that we have a high level of support for this kind of interoperability. Can you think of other ways people would want to integrate with something like IPFS, which operates at the data persistence layer?
Not to step on tkklein’s good point, but one quick answer I have for this new question is: VCS client interfaces, similar to (or directly compatible with) tailor. WebDAV is a related and even more widely used interface, but I already see that on the gateway Issues list. I would think IPNS-FUSE already takes care of a lot of other potential *nix backend integrations.
@flyingzumwalt : I think that’s roughly what I mean and I would expect that as IPFS matures that will provide significant value for organizations. When it comes to the question about how to best speak about the point, I don’t know what the best way happens to be.
Most companies store such large workloads on EMC Isilon or Netapp, who all have limitations on the four factors you listed above as to why use IPFS. I work on the sales side in storage but can say that almost all of my customers are looking to dump large archive workloads to AWS or Azure - this is always the low hanging fruit. So, archive use cases could be an interesting play especially in industries that generate PB’s of data like Media or Research
Just to add to the conversation:
I’ve just recently found out about IPFS and to me it seems like it can potentially be really positive for science reproducibility.
In my particular research community, large (up to around 10TB) binary files are generated through very time-consuming simulations. Storing them appropriately is a big deal (losing files means having to repeat simulations that can span several months). Sharing them with colleagues is of course also really important and is something that is not always doable in practice, unfortunately. For example, I can’t download simulation datasets of several Terabytes that are hosted at Stanford’s repository, since I am based in Europe, and would take me an absurdly long time to do so.
From what I’ve gathered in my short time reading about IPFS, the whole point is to increase file sharing speed through talking to your nearest neighbour in the network, and not necessarily a central repository. But I’ve also read that duplication is avoided, and that each node in the network stores only content it is ‘interested’ in. Therefore, in the case that I mentioned before, how would IPFS decide who stores these large datasets? Wouldn’t it be too costly to have them duplicated? If so, we would be back at the situation that I am now: downloading a huge dataset from across the globe is infeasible.
I’m interested in reading comments on this from more knowledgeable members of the IPFS community 
Hi, I work in a Web user behavior analysis company, you can compare to the Google Analysis. And the tracking code generates several TBs of data every day. And we store them in AWS S3 setting the expiration so that limit the total volumes to hundreds of Terabytes. We are seeking the ways to reduce duplication of data stored so that we can save money.
There are millions of sessions per day, that means we will have millions of ipfs nodes (short-lived, from seconds to tens of minutes) across the web once we deploy the js-ipfs on it. I believe that may release the most potential of IPFS.
OK, back to the point. Basically, we are watching and recording all the DOM changes happens on the page while the users are visiting the site so that we can restore the session in the future for analysis. Currently, we need the following things:
@jeiros I think you’re pretty much correct in what you say. Maybe a few points for thought:
Why GT Systems likes IPFS and is looking very seriously at using it. These comments should be read in conjunction with our initial community post here GT Systems: building Blust+SPA, the world’s first legal, approved, peer to peer movie sharing network; exploring IPFS, libp2p and Filecoin as a tool set
We developed our Secure Peer Assist (SPA) technology to overcome the issues with distributing video (movie) and other large files via the Internet. It is now approved by one of the biggest studios in Hollywood with more to come. We identified, very early on, the need for a file system. We were aware of some of the work around content addressing and new models for the Internet (although not specifically IPFS) but were very much aware of our limitations as a startup and felt these were outside our remit. So, we specified our own version and included its development in our budgets. It seems that IPFS has come along at the perfect time to meet that requirement. While it seems to be early days in its development, that is a good thing in that it enables us to contribute and influence its direction. We’re optimistic that recent developments in Filecoin and crypto-currencies in general will also help accelerate that significantly.
Why we like IPFS
Rhett Sampson
GT Systems
8 February 2018
I’m working on a side project for creating a database for learning materials, including large media files. Such a database could be pretty big, perhaps not dozens of terabytes but still sizeable. One thing I want for this database is for it the be decentralized where many people can pitch in to host it, and versioned using a graph of trust (like the linux kernel), rather than allow-edits-then-fix like Wikipedia. For that I’m developing a DVCS on top of IPFS.
Can the video website like bilibili youtube save their videos on the ipfs network at a lower cost?
We are interested in lots of different kinds of large data volumes because we work in Video Production. One of the biggest problems we face in our everyday work environments is the crashing of hard-drives. Furthermore, direct collaboration during the creative process (even in the same room) is often challenged by poor uplinks and varied partition formats of drives. Versioning is very important in this process, and a final issue is the availability of finished media assets.
IPFS in its ideal state would help us to solve all of these problems.
What I love about IPFS is that, as long as you have 2 nodes with the same data, you don’t need complex backup systems. One peer failed, okay, the same hash is in the other peer.
So simple, no need to have any backup management programs or redirections, not even a person watching out.
You can even have those 2 nodes in different buildings, even in different countries, they don’t even need to be interconnected or coordinated in any form to realize “one failed so the other must step upfront”. They don’t even have to know the other exists.
Your tech support can be busy with something else and not need to rush to restore some interconnected system between backups. The network’s got your back.
And when you get to repair your original computer, you can restore the whole backup network with a couple commands, in just a minute!
1: It makes backing up information (and keeping access to it) much more simple (and less stressing).
2: 0% downtime all year round for websites. No hosting company can guarantee you that nowadays, IPFS can.
On IPFS, only the apocalypse could bring down your sites lmao 