From @SpiritQuaddicted on Sat Sep 26 2015 20:34:37 GMT+0000 (UTC)
When hosting big files latency is not an issue, but a use case often advertised is hosting one’s personal page or lots of small files. How does IPFS handle that on and how “fun” is it?
Let’s say I request 10 files that make a webpage, some HTML, some CSS, some images. As I understand it each hash is first asked in the DHT, the DHT returns block hashes, we then ask the DHT for peers that have those blocks, then we ask those peers to send them to us, etc. This piles up. How much worse will it typically be compared to the same webpage hosted on the normal web.
A related question is how much overhead in terms of bandwidth is used?
Copied from original issue: https://github.com/ipfs/faq/issues/46
From @jbenet on Sun Sep 27 2015 00:24:46 GMT+0000 (UTC)
Yeah this is an important but also a very long-to answer question. I dont have time right now to give you a full treatment (it might take multiple papers’ worth), but i can point you in the right direction to understand how we’re approaching this. I’m also just going to stream thoughts, so forgive the unorganized mess. It’s the sort of simple question that unravels a massive iceberg.
TL;DR: You have to sink yourself deep into the merkle-dag model to understand why IPFS is actually much faster than the traditional web, even though initial seeks may be slower. and regardless, the performance today is nothing compared to what it can be, when it leverages all the properties correctly. we’re just starting somewhere and improving from there. Think of IPFS as the web over git and bittorrent.
There’s many things at play here:
- most importantly, immutability + content addressing.
- with these, you can cache things forever. infinity times faster!!!
- it turns out that disk space is getting cheaper much, much faster than bandwidth,
- bandwidth-to-the-end-user of the Web is bottlenecked at the last-mile ISPs, NOT on LANs, NOT on the backbone, and NOT across continents. And this is not a trivial problem to solve though people are working on it.
in the future nope, today, it is cheaper to ship harddrives preloaded with TB of stuff than to stream them. Similarly, local storage is very large, and soon we’ll be able to keep entire replicas of the content people access physically close to them (their home, say). It’s not just me talking, you can see that Google, Apple, and others are already moving in this direction. Netflix is shipping content caches to ISPs. You wait, they’ll ship it to your home next. (though probably in some partner device owned by another company, not themselves. though who knows, netflix has pulled some great tech feats)
- the point here is that the content can be moved across the network to precisely where it is needed, adaptively. Even preemptively. and for non-large-media (documents and so on) you can just replicate it all over the place (we already do this with things like Dropbox/Box/etc).
- now, the really hard part for applications is being able to use that content on the web-- this is one of the key improvements IPFS gives: a sane way to address and move content and make it accessible to the web browsers, natively.
- another very important part is native cryptographic operations: encryption, signing, record-crafting, certificates, and capabilities. Putting all these pieces directly on the application transport allows you to create content that can move over other transports that are untrusted, controlled by other entities, fully or temporally disconnected, and so on. Meaning that anybody can build a protocol to move IPFS objects faster / to fit their use case without changing the security model. Try doing that with HTTP today preserving the same assurances of naming, and linking (i.e. try embedding content from websiteyoudontcontrol.com and shipping a webapp to a mobile device that will be offline)
transports the web today works mostly over
HTTP/TCP:80/IPv4 a few differences exist, but barely moved the needle in regards to global traffic. the three major exceptions IMO are
HTTP(2)/QUIC/IPv4. (IPv6 deserves an honorable mention, but considering how old it is and how little it has penetrated…). The amazing thing about QUIC, by the way, is that
>60% of Chrome <---> Google Servers traffic is over QUIC. And they have a target of reaching 100%. Talk about awesome. (IPFS/QUIC is a must, and soon.)
multiple transports note the important thing here is that IPFS is designed from the ground up to work over arbitrary transports. it means that you can just as easily do
IPFS/TCP:*/IPv6 as you can do
IPFS/HTTP-long-polling/... you get the idea. the point is that all ipfs content, the links, and how you get it is completely decoupled from these transports. WebRTC and WebSockets are patched into the web today, and you cant natively expose links and hope they get downloaded over the transport that makes the most sense. That’s a very important piece here.
content + peer routing differences peer to peer systems have many different forms and different algorithms are tuned for different use cases, for example just DHTs vary dramatically in latency properties, security (sybil resistant, etc), scalability, and so on. And then there’s other things like pub/sub (pub/sub is a content routing thing? yeah, it used to be called “multicast”)
- but actually, this is not new at all, forget about the web for a second and thing just about “establishing a stream” between two hosts on the internet. “just dial ‘foobar.com’ with TCP/IP” is another iceberg that hides dozens of protocols to make the simple interface work. ICMP, OSPF, ARP, RIP, BGP, DHCP, DNS, Ethernet, WiFi, … All of this is mostly hidden from the web developer because IPs “thin-waist” model works so well. But when you try to build smart content routing systems on top, you end up with a variety of possibilities, “the right thing to do” is highly dependent on “who”, “what”, “where”, “how” the thing is being done, and no one solution fits all. HTTP got away (for a long time) with this, but if you look at any serious operation today, large companies spend hundreds of millions of dollars working around the fact that HTTP is not tuned to leverage the internet and its protocols as efficiently as it could. This is why BitTorrent and Git, which present two very different models about content distribution, were so successful. Because they abandoned the traditional web links/lookups and constructed datastructures with propertiesn and operatiors tuned to how the internet really works. (Why would you ever need git’s offline-first model if all computers were online and traffic moved instantaneously? Why would you ever need direct p2p connections if one HTTP server had infinite bandwidth or 0 latency?)
- The contributions of IPFS here are to not hide from the complexity, and instead understand and compartmentalize it, build modules that make sense for each use case, and make really good interfaces that allow all these pieces to work together and layer nicely (like IP layers over the layer 2 protocols). For example, you could do pub/sub (multicast) on IPFS and avoid using a DHT completely – this can do much, much better than regular point-to-point HTTP.) (a lot of this will be shipped as a
libp2p we’ll be releasing separately)
Ok this rant has gone on long enough.
To get back your question, can lots of little files over a DHT ever be faster than directly streaming them from one source?. Of course! But you have to understand why, and how, we’ll get there. You have to leverage immutability, cryptographic protocols, true transports agnosticism, and routing protocols, to let the content be stored and moved in “the best” way possible, which varies dramatically depending on what the use cases, the devices, and the networks look like. We don’t start at the max today of course, we start with something much slower (a regular kademlia DHT), but we liberate the data model from the network protocols, and allow improvement to happen. We introduce a layer in between (the merkle dag and the IPFS data model) to create an interface that applications can rely on, independent of the network protocols, and then we let computers assemble themselves in whatever networks they want, and have everything work exactly the same. The protocols are thus freed to improve to match their use cases.
Sound familiar? yes, it’s the IP hourglass story all over again. Over the last decades, we broke the end to end principle. But the good news is IPFS is here to fix it.