Request for Guidance/Advice on Implementing Predictive Chunk Prefetching for IPFS-Video Streaming

Dear IPFS Community,

I’m a master’s student currently working on my thesis, which focuses on optimizing
video streaming performance over IPFS. My objective is to design and
prototype a predictive chunk prefetching mechanism that selects which
peer to retrieve each video segment from based on performance metrics
such as throughput or latency.

As I explore the architecture of the IPFS stack, I’m trying to
determine the most appropriate integration point for this logic.
Specifically, I would like to ask for your insight on the following:

Would such a prefetching and scheduling mechanism be best implemented
as a Kubo plugin (e.g., via libp2p hooks or connection gater)?

Would using a dedicated go-graphsync client for DAG traversal and
per-peer fetching provide a cleaner architecture than trying to modify
Bitswap?

Are there any existing efforts or modules in the IPFS ecosystem that
already address chunk-level peer control or streaming use cases?

My goal is to work within the IPFS protocol as realistically as
possible while allowing my application logic to schedule block
retrievals based on urgency and peer capabilities — for instance,
retrieving near-future segments from fast peers and preloading later
ones from slower nodes.

Any guidance on where such a system would fit best into the IPFS stack
and how to avoid unnecessary reinvention. It would be incredibly
valuable at this early stage.

Thank you very much for your time and work on this amazing project.
I’d be grateful for any pointers or suggestions.

Best regards,

Andre

Responding to this and to a related comment you made on Filecoin Slack (copied below)

I was planning on using HLS for Streaming the Video. My question was more into the direction of if there are currently ideas that yet have to be explored to improve Video Streaming on IPFS.

I was for example thinking, if it is possible to prioritize Video Chunks that are needed right away (e.g. at the start of the video or when seeking through the Video) from the fastest known peer. And Pre Load Chunks that are further away in the Video from slower peers.

There are a number of possible ways to optimize based on what you’re trying to do and where you’re seeing bottlenecks. My comments below roughly fall into “I would have written a shorter letter, but did not have the time” but I hope they are helpful nonetheless :upside_down_face:

  • If you’re using a scheme like HLS to divide up the data, then it may be that keeping the chunk size within a single block will help you (you can explore chunk sizes)
    • If you’re looking to cut down the time-to-first-byte then looking at say a CAR request via the trustless gateway API to get down to first byte quickly before switching back to block-by-block requests should help (although if this isn’t your need the complexity might not be worthwhile)
  • If you are using a single flat file for your data then you can experiment with different ways of encoding the data as UnixFS. For example: the trickle DAG feature in kubo might help with linear streaming, although the overall utility is likely worse than the balanced DAG approach. Similarly, as with IPFS Custom File Chunking for WARC and WACZ you could decide on something more bespoke (e.g. making initial blocks or metadata smaller, chunking the data along internal container format boundaries, etc.)
  • If you want different / more bespoke block pre-fetching logic or general downloading logic that’s specific to your use case then you can consider either: Use boxo (or potentially a kubo fork / plugin if that’s easier) to see what that downloading experience looks like in Go. Use helia such that a user can just go to an equivalent https://ipfs.video/ to download your video in a better way, or down the road perhaps even integrate into https://github.com/ipfs/service-worker-gateway to optimize when the downloaded content ends up being a video.
    • With Go you’ll likely have an easier time running an HTTP gateway that could give you interop with tools like VLC. With JS you’ll likely have an easier time showing people your changes in a browser (i.e. no special binary will need to be downloaded).
    • There’s lots of options here depending on what you’re doing (e.g. are you trying to front-run the prefetching your video client will do and so need to know about video formats, if 10 peers might have the content you’re looking for will you race multiple peers downloading data while the buffer is smaller and be more selective in how your peers download data as your buffer is bigger, …)

Some other notes on going down the retrieval pathway optimizations in the Go / JS stacks:

  • Go
    • You can try modifying bitswap package in boxo, but if you’re looking for higher level knowledge / optimizations then plumbing through your needs might end up requiring a separate system for doing the downloading. If building a separate system would almost certainly try with a separate CLI first before trying to integrate into kubo, rainbow, etc.
  • JS
    • My (perhaps out of date) recollection is that there are much fewer optimizations in place here regarding which peers to download and the API boundary is a little cleaner which might make this a welcome area of exploration within helia. If you’re interested in this approach I’d definitely open an issue in the repo to explore.

Hopefully this helps you think through some possibilities for your thesis, I’m interested to see how it goes!

Hello @adin,

Thank you really much for your response!

Before nailing myself on one rabbit hole that I want to explore it would be useful to run a variety of benchmarks I guess. Are there any existing benchmarking tools?

I found this one before: GitHub - Netflix/p2plab: performance benchmark infrastructure for IPLD DAGs, but as this one is rather old, I am not sure if there might be some new utilities/tools that do the job. If not I would make an own benchmarker.