In my personal experience, most of the time “reproducible CIDs” are mentioned, users mean CIDs of UnixFS data, which was already chunked in opinionated way by Kubo (go-ipfs).
The default Kubo (go-ipfs) ipfs add
parameters remained the same for nearly a decade, which made people assume there is a “set in stone standard”, that importing the same data will always produce the same CID, no matter what software is used (rather than the only guarantee being that the same CID always producing the same data).
Switching Kubo’s ipfs add
defaults (kubo#4143) will have a big educational impact, nudging people to learn that the same UnixFS CIDs are produced only when the same settings are used, and setting explicit “cid profile” if their use case depends on that.
FYSA we’ve made some progress towards allowing users to customize “profile of settings” that impact produced CID when files and directories are turned into UnixFS DAG.
Kubo v0.29 introduced:
Kubo 1.x release (ETA TBD, hopefully sooner than later) will switch Kubo’s ipfs add
“CID profile” from legacy CIDv0 to new CIDv1 (finally closing kubo#4143).
There is also aspect of content-type-aware chunkers (videos, images, archives) for UnixFS , which I won’t go into here (see example: WARC file chunking), but we should be aware of, because content-type aware chunkers will grow the number of possible “cid profiles”.
Q: Is there any actionable thing we could do today to make “profiles” a thing?
For the purpose of this discussion, test-cid-v1
and legacy-cid-v0
presets from Kubo could used as examples of “cid profiles” @robin hinted at, but those settings are hard to discover if someone is new to ecosystem.
Would it be useful to have “CID Conventions” section at https://specs.ipfs.tech/ as a way of disseminating information about involved settings to implementers that care about 1:1 reproducible CIDs? We seem to have enough “profiles” in the wild to make it worthwhile: “Kubo CIDs”, “Iroh CIDs”, “LUCIDs” etc