Community-driven website archive on IPFS

tekhnus · March 17, 2022, 11:28am

Summary: given a web resource with an enormous amount of data, how would you organize a community-driven archiving of its contents to IPFS?

A wonderful short video hosting, coub.com, is closing soon. Many users and content creators are grieving over this event. I am thinking about archiving its content in IPFS.

Writing a standalone program that downloads video files and uploads them to IPFS seems like a doable job to me. The problem is, there is enormous amount of content there, so the archiving should be performed by a collective of enthusiasts, where each one is running the archiving software, not by a single person. All those processes must somehow collaborate in order to incrementally grow a single public tree of data. By collaboration I mean that:

A newly joined process must somehow obtain the list of already downloaded videos (in order to not perform redundant work)
After a process downloads a video it must somehow get included in the common tree.

I have little idea about how to implement such a network of processes.

How would you design it?
Are there some sources of relevant information (existing projects, articles, etc) which might be useful for the job?

Notes:

In practice, we have to deal somehow with the corrupted data (being a result of an accidental error or of someone’s bad intentions). For the sake of keeping the topic as simple as possible let’s suppose that there is no such problem
The desired end result is to create a nice user interface (a website like en.wikipedia-on-ipfs.org/wiki or a mobile app), but let’s focus on just moving the data into IPFS in the simplest form possilble

planetoryd · March 17, 2022, 11:59am

Given current situation that IPFS is far from complete, I would create a github repository and coordinate the tasks with it. Tasks are distributed through it and CIDs are gathered and added to the IPFS folder by the moderators, and publish it finally

SionoiS · March 17, 2022, 12:26pm

Remember that the same video will give you the same CID (with default settings) so in theory you don’t have to split the work. With a common tool (video → ipfs → CID) coordination would not be needed.

The coordination would be in creating an indexing file for the entire catalogue.

edit: with this scheme redundancy would happen naturally for popular content and video that no one care about would not be archived.

planetoryd · March 17, 2022, 12:55pm

You might end up downloading the same video again and again by different nodes.

SionoiS · March 17, 2022, 1:04pm

Yes! And that’s a good thing.

My methodology would be to let everyone add their favourite videos to IPFS with a tool guaranteeing the same video equal same CID.

This way the most popular videos would be the most available on IPFS.

planetoryd · March 17, 2022, 1:07pm

Then you should ask users to pin it, but according to the traditions IPFS doesn’t need users to do the heavy lifting , which I don’t quite agree though.

shaneguignard · March 17, 2022, 3:42pm

Maybe using the video metadata as a hash to create unique index? this way it is even less likely to end up with duplicates.

SionoiS · March 18, 2022, 12:10pm

For the metadata I would use IPLD then link all of it together to form some kind of indexing system.

Akita · April 2, 2022, 5:20pm

I think what you want is a collaborative cluster:

Topic		Replies	Views
Need help, will hire to build video store/retrieval community Ecosystem and Usage use-cases-and-apps	0	602	June 5, 2018
What video guides on IPFS would you like? js-ipfs , go-ipfs , ipns , libp2p , dht	13	390	June 28, 2022
Can IPFS be used to make a Video editing program? Ecosystem and Usage use-cases-and-apps	4	1260	April 4, 2018
Linking data on IPFS	1	761	November 29, 2017
IPFS for audiovisual archives and preservation? Libraries, Archives and Museums	3	940	November 10, 2017

Community-driven website archive on IPFS

Related topics