Background: While I love git, there is some friction for me because I’m trying to use it for purposes it was not intended.
I’m trying to use git to version control individual documents (xml files) rather than code bases or applications where individual files by themselves don’t make much sense.
I currently have about 20,000 such documents and I would ideally like to be able to version and tag each document individually.
Further, I’d like text editors to be able to “pull” or “clone” (both terms are bit obsolete in an ipfs world, perhaps i should just say “pin” or “access”) only the individual documents of interest without cloning/pinning the entire collection.
While I’m now managing 20,000 such documents, at the current rate we will soon be over 100,000 with many more to come.
Managing each file as a separate repo starts to feel crazy.
But creating one big repository also feels crazy. To work on an individual file a user would have to clone the entire repo, or entire subsection of the repo just to work on a given file.
Commits would be carried out for the entire repository that are meaningless for most of the documents that remain unchanged, and version tags would be rendered fairly useless because they would only version the state of the entire corpus but not the individual documents.
From what I’ve learned about ipfs and ipns thus far (mostly from this discussion board) I wonder if ipfs
and the freedom from location based files would offer a new kind of management that would be perfect for my use case.
What I’ve been thinking about is something like the following:
For any file and/or directory a user could run ipfsvc init
(vc for version control).
The ipfsvc init
command would do the following:
- it would generate a bare index file
- it would run
ipfs add
for the index file, taking note of its hash - it would generate a new key
ipfs key gen
, and using ipns,ipfs name publish --key=<result-of-key-gen> <hash-of-index-file>
assign the hash of the index file to the newly generated key.
The ipns key would function a little bit like “origin” except it wouldn’t be remote,
since ipfs would allow us to leave this idea of local and remote behind.
Now I could commit a file to the index file with ipfsvc commit -am "first commit"
--target `
The commit command would do the following.
- it would
ipfs add
this file and take note of the new content hash - it would create a commit object containing information about the hash of the newly committed file, info about the committer, author, time, tags, etc. Typical git commit stuff.
- it would
ipfs add
this commit file, taking note of the new content hash of the commit object - it would use the (supplied as the value of the --target option) to locate the index file
- it would prepend the hash of the commit object to the index file, forming the “tip” of an array or list of commit object hashes.
- it would
ipfs add
the new index file, taking note of the hash. - it would point the key-hash to the new hash of the updated index file.
Now, if I have the key-hash, (just as I need the github repo before I can do much), I have access to the entire history of the individual file. With the key in hand, I could supply a tag number and be easily taken to the commit hash for that tag and then to the file referenced in that commit.
But the beauty of this approach – I think – is that a file can be located anywhere and doesn’t depend on relative location of the .git directory to find its history.
The file could be located in several other directories that are all being versioned in the same way with their own tags
with none of the git hassle of submodules, etc.
Now, I could clone the file, or any ancestor directory containing the file, depending on my needs.
I imagine other people have already thought about this a lot more than I,
but it helps me get more familiar with IPFS to think through different problems myself.
From my vantage point, this seems like a fairly simple solution to the above use case.
Any thoughts? Any places I’m going way wrong? Is there a subcommittee, special interest group working on ipfs and version control?