Summary: given a web resource with an enormous amount of data, how would you organize a community-driven archiving of its contents to IPFS?
A wonderful short video hosting, coub.com, is closing soon. Many users and content creators are grieving over this event. I am thinking about archiving its content in IPFS.
Writing a standalone program that downloads video files and uploads them to IPFS seems like a doable job to me. The problem is, there is enormous amount of content there, so the archiving should be performed by a collective of enthusiasts, where each one is running the archiving software, not by a single person. All those processes must somehow collaborate in order to incrementally grow a single public tree of data. By collaboration I mean that:
- A newly joined process must somehow obtain the list of already downloaded videos (in order to not perform redundant work)
- After a process downloads a video it must somehow get included in the common tree.
I have little idea about how to implement such a network of processes.
How would you design it?
Are there some sources of relevant information (existing projects, articles, etc) which might be useful for the job?
Notes:
- In practice, we have to deal somehow with the corrupted data (being a result of an accidental error or of someone’s bad intentions). For the sake of keeping the topic as simple as possible let’s suppose that there is no such problem
- The desired end result is to create a nice user interface (a website like en.wikipedia-on-ipfs.org/wiki or a mobile app), but let’s focus on just moving the data into IPFS in the simplest form possilble