How does the identity hash interact with Tsize?

I have 2 related questions:

  1. In UnixFS, may/should the Tsize of a link with the identity hash be omitted (because the block is right there)?
  2. What is the Tsize of a UnixFS block that contains an identity link? Since Tsize is not supposed to count duplicate blocks, it makes sense that the block size of the identity linked block not count toward its parent’s Tsize twice.

Very important we are speaking here about Tsize also known at dagsize, which is NOT used to calculate offsets into the file (blocksize is used for that). (@NCGThompson think you understand that, just for other readers)

Tsize is currently not used by any piece of code that I know off.
I don’t know what Tsize was supposed to do but the only usage I can think for it is optimising downloads (some algorithms like graphsync are more efficient at downloading BIG dags vs lots of small ones) this could allow a client to send multiple graphsync request to different nodes more efficiently.

Given that you don’t have to download an inline block I would omit it for inline links. (this is allowed because Tsize is a SHOULD not must)

Since Tsize is not supposed to count duplicate blocks, it makes sense that the block size of the identity linked block not count toward its parent’s Tsize twice.

No, accurately counting duplicate blocks is an extremely expensive operation that require doing a full traversal of the data, so Tsize actually ignore duplication and count blocks that show up multiple times, how many times they show up.

What is the Tsize of a UnixFS block that contains an identity link?

Whatever the size of the block is, exactly like if the inlined blocks is as any other one. (so the inline block size is counted “twice”)

One last thing, the Tsize of a an inline link is the size of it’s multihash.

1 Like