For one of our platform deliverables, we are preparing a private cluster that will do transient distributed transcoding, which will break video files into one-second segments of raw video (no audio). Each segment will have:
- a lockfile (so that nodes don’t take work from other nodes)
- a progressfile (like a distributed FIFO)
- daughter segments in a variety of formats, qualities and dimensions
- visual checksums (as predecessors to an array of hamming-distances)
- thumbnail image
This process will be managed with IPLD DAGCBOR using js-ipfs and RabbitMQ and interfaced to with the Quasar framework (and OrbitDB) as well as a custom CLI for troubleshooting.
There will be one “primary” cluster node that the clients all bootstrap off of, and which manages the transcoding job (including the fragmentation of the ingested file, metadata management, audio processing and the recombination of the fragments at “job’s end”). This cluster node will manage one render job at a time (from the top of the QUEUE), also using its own available resources (if any) to assist in the transcoding. The entire pinset will be created after a successful rsync to a ramdrive from physical storage (to help saturate the uplink / conserve drive lifetime).
We will be using ngrok to punch through NATs to assist in appliance discovery / registration and as C&C to manage the transcoders, each of which is one of:
- CoreOS with ipfs-cluster, ffmpeg, sox
- custom electron client (made with quasar.js)
- high-mem high-proc handmade and managed server instance at a datacenter
Depending on the results here, we may graduate to a combination of openfaas/faas and ngrok.
Afterwards, we will be serving these video files via QUIC using IPFS as filestore. There is a chance that we will be using HDF5 as a final resting place for all of the fragments - and in that case we will probably try to add HDF5 to multiformats. This depends on the results, however. At any rate the multiformat stubs for video files need to be extended, I believe…
Anyone have some gotchas that I should be careful of with this approach? Are there any repos anywhere that might have useful patterns for this work? I am keeping a list of interesting approaches - you can see them below. Many of them, however, are just inspiration…
Because this project is open source, I am disclosing it here. Until we get to anything like a release, however, we will be keeping the development on a private gitlab server. If you are interested in helping out in any capacity, please feel free to PM me.
I will keep this post updated as we progress.