Matt from Pinata here. You are spot on for the specific use-case and behavior that you described in the article. While we’re proud to say that we prevent users from being double-charged for direct duplicate file uploads, accounting for duplication at a multi-directory level is something we haven’t been able to achieve yet.
You’re also spot on for the reason behind this behavior as well. While an IPFS repo itself can benefit from the deduplication behavior you describe, it’s really really hard to reliably perform accurate accounting on this without massive technical overhead that slows the system.
So far we’ve chosen to go the route of performance because this didn’t seem like a high use-case by users. However, we appreciate you bringing this use case to our attention! If you ever want to chat further, we’d love to talk with you more in depth about what you’re looking to achieve. Even if you don’t end up using our service I’m sure there’s a lot we could learn from each other.
Also, thanks for opening up an issue on the go-ipfs github addressing this. We’ll be following it closely to see how it progresses. Hopefully there’s some improvements to Pinata we can make once it’s implemented!
Related github issue: https://github.com/ipfs/go-ipfs/issues/5910