To my current knowledge, there is no way to see how many times an unique file has been downloaded from your node. This information would be very interesting to have. Is there any plans to implementing this?
This seems like something that could be very expensive. Did you have a use case in mind?
Files are chunked when added into IPFS, so how many times should we say a file has been downloaded if the first half of its chunks have been downloaded 1000 times and the second half of the chunks haven’t been downloaded at all? For example, a video that people only stream the first half of before switching to something else.
It’s possible that a chunk of one file is also a part of another file. Which file do we count the stats for that block against when someone requests that shared file chunk?
On its face it seems like a nice-to-have kind of thing (not sure what I’d do with it, though), but what I’m getting at is that it seems like we’d need to keep track of statistics for every single block on the node. One of my nodes has 1.6 million objects in it – that’s a lot of blocks to track persistent stats for.
Okey I completely understand that argument of each file being made up of many blocks. At least keeping just a simple integer for each block’s download count which shouldn’t be too difficult?
If you had 1.6 million objects stored in an 64bit integer, that would be 6.4 MB of storage, which I think is very acceptable. At least we should have the option of enabling it, right now the bandwidth upload amount doesn’t give very much.
Unfortunately, we’d need to store a mapping of block name to count. Additionally, we’d need to make updating this mapping efficient. Given those two constraints, we’d likely have quite a bit of overhead (the actual size of the counts will be tiny compared to the overhead).
One way to avoid some overhead would be to inline the counts into the blocks when we store them in the datastore. However, this breaks abstractions and would turn every block read into a block write (unacceptable performance hit).
If we store it in a separate datastructure, we’d have to store the CID (~40 bytes) plus the size (8 bytes) plus the overhead of the datastructure itself. We’d probably have to write a custom on-disk KV store optimized for size and write speed (a lot of work).
That’s something I didn’t think of… Wouldn’t it be possible to store in RAM together with the list of local blocks? That way, it wouldn’t use much memory and saving it could be done every few minutes. It would not use many write operations, some extra storage would still be needed thought. I think for smaller nodes with a couple of thousand objects, a few extra MB to see these statistics would be very worth it. Do you think this implementation would work?
That could get very large; we try not to store the entire list of blocks in memory (except when we GC but we need to fix that).
So how does the client know if it has a certain block? Does it check on the disc every time?
Yes modulo a read cache (I think? I’ve never worked on that part of the code).
Note: this (but not the storing in memory part) is totally doable, just non-trivial and low-priority compared to some of our other issues. Of course, if you’re interested in working on this, we always take patches (but we’d need to do some design work together on https://github.com/ipfs/go-ipfs/issues first).
I would gladly try to make something.
However I’m quite new to GO and the project. Is there any thread on how to get to know the actual code or should I just walk through it and get a rough understanding of it?
The first step would actually be to open an issue to discuss if/how to do this. However, this may not be the best feature to wet your feet with it you have little go experience.
We could also just log the blocks accessed in something in the same spirit of the Apache HTTP access log. This way external tools could read the log and do most of the work.
Assuming we truncate, yes. That would give us a nice metric by which to track recently popular blocks (really, it would give us a rate and a ratio). Actually, I really like that tactic. Nice and simple and it gives us the information we actually care about.