Thanks. I agree that it should be a developer concern but the problem is it’s not. Everything kinda works out ok as long as you say, “F it, just use the defaults unless you know what you’re doing” but even then I think it has problems. Lets just go through a scenario. I have a file and I’d like to add it to IPFS. You can do the “just use the defaults” route. But I’m pretty sure that there’s some duplication in there and for some reason I’d like to save some space or I’d like to stream it and use the trickle-dag so I go and use a different hasher. Now I have to add it twice. Once with each hasher. Not a problem for small files but it they’re large it would be a non-trivial amount of compute, time and storage. Assuming that remember not to pin it I can reclaim the space for the one I don’t use later. But I have to remember to juggle the pins so I don’t accidentally pin them both. If I mess that up in any way I’ve doubled my storage space to store the exact same file twice completely negating any possible savings from using a different hasher. Now I somehow need to compare the actual blocks to see if it even did any good. This entire process is based on my intuition that there might be some savings. There is the possibility that there is no significant savings and it’s a waste of time and resources. Even if there is little to no savings in space I need to intuitively decide If there might be further savings when files are added in the future, which I can’t possibly know.
Say the non-default hasher performs pretty well so I decide to use it. Now suppose someone else adds the same file but with the defaults. Now there are two exact copies of the file on the network. Ok, that’s not really my problem until I go to use something that references that CID and I end up pinning it. Not it is my problem and I"m storing two copies, again negating any possible space savings.
So I have files on the network that have identical content, which is difficult or impossible to even communicate that the content is the same, and that have different properties and will perform differently. I’m not even sure if there is a way to communicate the different layouts in advance so that even if I did know they were the same content I could choose one or the other depending on how I’d like to use them. ie. CID1 and CID2 are the same content but one used a trickle-dag so I’ll go with CID2.