Git like version control without the location based dependencies (e.g. the .git directory)

Background: While I love git, there is some friction for me because I’m trying to use it for purposes it was not intended.

I’m trying to use git to version control individual documents (xml files) rather than code bases or applications where individual files by themselves don’t make much sense.

I currently have about 20,000 such documents and I would ideally like to be able to version and tag each document individually.

Further, I’d like text editors to be able to “pull” or “clone” (both terms are bit obsolete in an ipfs world, perhaps i should just say “pin” or “access”) only the individual documents of interest without cloning/pinning the entire collection.

While I’m now managing 20,000 such documents, at the current rate we will soon be over 100,000 with many more to come.

Managing each file as a separate repo starts to feel crazy.

But creating one big repository also feels crazy. To work on an individual file a user would have to clone the entire repo, or entire subsection of the repo just to work on a given file.

Commits would be carried out for the entire repository that are meaningless for most of the documents that remain unchanged, and version tags would be rendered fairly useless because they would only version the state of the entire corpus but not the individual documents.

From what I’ve learned about ipfs and ipns thus far (mostly from this discussion board) I wonder if ipfs
and the freedom from location based files would offer a new kind of management that would be perfect for my use case.

What I’ve been thinking about is something like the following:

For any file and/or directory a user could run ipfsvc init (vc for version control).

The ipfsvc init command would do the following:

  1. it would generate a bare index file
  2. it would run ipfs add for the index file, taking note of its hash
  3. it would generate a new key ipfs key gen, and using ipns, ipfs name publish --key=<result-of-key-gen> <hash-of-index-file> assign the hash of the index file to the newly generated key.

The ipns key would function a little bit like “origin” except it wouldn’t be remote,
since ipfs would allow us to leave this idea of local and remote behind.

Now I could commit a file to the index file with ipfsvc commit -am "first commit" --target `

The commit command would do the following.

  1. it would ipfs add this file and take note of the new content hash
  2. it would create a commit object containing information about the hash of the newly committed file, info about the committer, author, time, tags, etc. Typical git commit stuff.
  3. it would ipfs add this commit file, taking note of the new content hash of the commit object
  4. it would use the (supplied as the value of the --target option) to locate the index file
  5. it would prepend the hash of the commit object to the index file, forming the “tip” of an array or list of commit object hashes.
  6. it would ipfs add the new index file, taking note of the hash.
  7. it would point the key-hash to the new hash of the updated index file.

Now, if I have the key-hash, (just as I need the github repo before I can do much), I have access to the entire history of the individual file. With the key in hand, I could supply a tag number and be easily taken to the commit hash for that tag and then to the file referenced in that commit.

But the beauty of this approach – I think – is that a file can be located anywhere and doesn’t depend on relative location of the .git directory to find its history.

The file could be located in several other directories that are all being versioned in the same way with their own tags
with none of the git hassle of submodules, etc.

Now, I could clone the file, or any ancestor directory containing the file, depending on my needs.

I imagine other people have already thought about this a lot more than I,
but it helps me get more familiar with IPFS to think through different problems myself.

From my vantage point, this seems like a fairly simple solution to the above use case.

Any thoughts? Any places I’m going way wrong? Is there a subcommittee, special interest group working on ipfs and version control?

1 Like

If all you want is version control with the ability to checkout individual documents, it might be worth taking a look at svn and its support for sparse checkouts. If you want to tag individual files, put the tag id in the commit and then just check the log for the tag id to get a revision id. Not to disuade you from building on IPFS, but svn is a much more mature ecosystem with well-defined and stable structures for doing version control of large sets of files. I’ve used it for document stores before and still do so. One thing to keep in mind is that you can use git-svn to do the checkout and then it looks a lot like git on the client side, can use git tools, etc.

Good luck!

1 Like

This is a popular topic here on discourse. Here are two of the existing threads, which have lots of useful info that you could explore.

Implementing a git-inspired version graph directly in ipfs:

Adding git repositories to ipfs and using IPLD to resolve that content on IPFS using the git commit hashes:

2 Likes

At this point you’ve probably gone for a different solution. But for reference, I’ve started implementing something similar, here is the post about it, would love to hear your feedback. There are some differences to your suggested approach, like heavily using MFS rather than creating commit files etc. Partial checkouts is also something I’d like to support, it’s on the TODO list.