When I request some data, its starts to come, but stops after a while, bandwidth is still used

From @jvsteiner on Wed Apr 13 2016 12:25:42 GMT+0000 (UTC)

I used ipfs get to try and grab some files that I found referenced in this list of possibly available content at https://ipfs.io/ipfs/QmU5XsVwvJfTcCwqkK1SmTqDmXWSQWaTa7ZcVLY2PDxNxG/ipfs_links.html

some worked, some didn’t, which is expected if no-one online actually has this content at the time. I noticed, however, that in many cases, I was able to run ipfs object links or ipfs object stat on the higher level hashes, but the same did not work on the sub-hashes (sorry if terminology is wrong) that were referenced in the results of ipfs object links. in cases where i ateempted to get objects that were not actually available, or even objects for which not all sub-objects were available, I ended up in the following, perplexing situation: huge amounts of bandwidth were used, even though the download was stalled - the command line showed the number of Mb downloaded, but it stopped. still, there was tons of traffic, both up and downstream. My questions are:

  1. why should there be downstream traffic, if the download is stalled because some parts are not available
  2. what provision is there to determine content is no longer, or not presently available? what is the expected behavior when attempting to download an object that is partially, but not fully available (ie. some linked objects are not hosted by any online node.)
  3. what is the expected background bandwidth usage for a running ipfs daemon, and is there a way to control that as a configuration parameter?

    Copied from original issue: https://github.com/ipfs/faq/issues/108

From @noffle on Wed Apr 13 2016 23:04:07 GMT+0000 (UTC)

Hi @jvsteiner.

Code of Conduct

First and foremost: the linked list very clearly contains a variety of copyrighted materials. Please read our Code of Conduct, and, specifically, the section on Copyright Violations. We’re here to help, but we take this subject very seriously. Please respect the Code of Conduct.

Bandwidth issues

Are there any specific (non-copyrighted) hashes that you can reproduce this on? Data on IPFS is broken up into chunks (much like bittorrent), so a provider may only have some subset of those chunks.

  1. Did you mean upstream traffic? Downstream traffic seems expected: you asked for content via ipfs get.
  2. There is no mechanism for knowing for sure what content is definitely / definitely not available. This is a consequence of complex network topology possibilities (the content you want is out there, but you’re connected to a subset of peers that doesn’t have it, or maybe you / they are behind a NAT or firewall, etc).
  3. Background usage should be relatively low, but some up- and down-stream usage is expected for maintaining the DHT (a mechanism for discovering content). Reducing this is on the roadmap for Q2 2016, as are bandwidth controls.

From @jvsteiner on Thu Apr 14 2016 06:13:36 GMT+0000 (UTC)

Hi - Understood on the Code of Conduct - my goal was to test out using some user provided content, as opposed to that hosted by the project team members, which is more likely to be available.

re: (1) I meant downstream traffic - if I request an object that no one has, I would expect my downstream bandwidth for that request to be close to background rates - since no has it, they can’t be sending it to me. That didn’t seem to match the observed behavior, and I thought that was odd. I confirmed using du -sh ~/.ipfs that this folder was not increasing in size, yet the total received data size exceeded 200Mb (the size of the original file requested was under 20).

re: 2&3 - understood

From @noffle on Thu Apr 14 2016 15:32:43 GMT+0000 (UTC)

Hey @jvsteiner, thanks for the additional info!

The extra 200mb of downstream traffic sounds very strange. Do you have a hash that you’re able to reproduce this effect for?

cc @whyrusleeping, who might know of other debug information that could be pertinent.