K, a few things to address
Nodes: Your personal node isnât really different from the node running at ipfs.io, except, of course, talking to your own node on the same machine is vastly more efficient than talking to the ipfs.io node across the internet. Also, you are sharing the ipfs.io node with many, but you are the sole user of your own node.
Caches: once content has been downloaded once, it survives in the nodeâs cache for a while (can be a long while, if you donât download much and the cache is large. 10GB by default). This is why the second run is so much faster, you are receiving the content from the local cache on your own machine, not from the internet. The reason the first run on ipfs.io is faster than the first run on your own node is due to the fact that it already has part or all of the content in its own cache, and is simply returning it to you (this, of course, is only true if someone else requested that content before you and it hasnât been flushed from the cache yet).
AcceleratedDHTClient: the normal DHT client has to walk the DHT to find which node contains the block you want, open a connection to that node, and download that block. this happens for every block. whatâs taking a long time is that walk, as it has to connect to a set of DHT servers and ask the question, then they each return a new set of DHT servers, which you then have to talk to and do it again, until one or more of the DHT servers return the address of the nodes that have your content. Your node then opens a connection to one or more of them, and downloads the content from the fastest one of them. This works that way because your node only knows about a small fraction of the DHT servers on the network. In opposition to that, the AcceleratedDHTClient scans the network every hour and keeps a list of all the DHT servers on the network (usually between 10,000 and 15,000 DHT servers), so when it needs to walk the DHT to find your block, it can usually do it in one hit, vastly speeding up the process.
Availability: your node can only find out where a block lives if the node that has it has published it to the DHT. This normally happens every 12 hours, and a DHT record survives in a DHT server for 24 hours. In theory, all available blocks should be discoverable. In practice, itâs a mess. While every node is trying to âreprovideâ (thatâs what itâs called) every block every 12 hours, if the node has many blocks, it can take vastly longer than 12 hours to do 1 run. Which causes some blocks to disappear from the DHT for long periods of time, making finding them extremely difficult. Thatâs another reason to run the AcceleratedDHTClient. I have around 30,000 blocks pinned in my node. Using the normal DHT client, it would take over 100 days to do a reprovide run, leaving my blocks undiscoverable for 99 of those 100 days in each cycle. Running the AcceleratedDHTClient allows my node to do a reprovide run in 12 minutes. I suspect many servers serving a lot of blocks are still using the default DHT client.
So, the speed at which your node can find the blocks you are looking for is dependent on whether they are easy to find. If only 1 node has it and its reprovide cycle sucks, itâs going to be really hard to find. However, once you have downloaded it, it is now in your cache and your node has immediately done a âprovideâ on it, so the next person looking for it has 2 options. if they download it, the next 1 now has 3 options. the more popular your content is, the easier (faster) it is to find and retrieve. If itâs obscure, the node better be running the AcceleratedDHTClient, or itâll take you a while to find it. However, once a connection is established to that node, and it happens to also have all the other blocks you are looking for, your node will download the whole set over that connection without using the DHT. In fact, if a block canât seem to be found, if you let your node search for it for long enough, it randomly walks the network and can run into a node that has it by chance, and if that node has everything else, you get the whole thing. I sometimes let my node look for something for a whole day and it often finds it after many hours of looking. Of course, if the owner had done a proper job of reproviding, that could have been found instantly.
Anyway, I still think that using your own node to retrieve the content you want is the best way to solve your problem, but you are prisoner of the quality of the reprovide the owner is doing and the popularity of the content. It could be fast, it could take forever.