Exactly what gets hashed?

Consider this command

 ipfs object get $(echo -n 'hello world' | ipfs add | awk '{print $2}')

So we’re just adding ‘hello world’ to the merkle DAG and then getting the associated object. It defaults to json encoding, so the command returns this:

{"Links":[],"Data":"\u0008\u0002\u0012\u000bhello world\u0018\u000b"}

We can use the --encoding protobuf option to disable json and give the result in a binary format, which produces several characters on my termal that this forum cannot display.

I assume that if I hash this binary value via sha2-256 and then encode that into base58, I ought to get the same string that ipfs add wrote to stdout (Qmf412jQZiuVUtdgnB36FXFX7xg5V6KEbSJ4dpQuhkLyfD).

But I have tried this (https://gist.github.com/MatrixManAtYrService/fa384d0d524c68d6c903e2be281eb672) and I get a different hash (QmXLUHrka5FQ1ac7ooZAg8MyVcLnzj1GEY5rW92tQyGguq).

I can think of three explanations:

  1. something other than the ipfs object is being hashed
  2. ipfs object get is writing more than just the object (a newline perhaps) and I’m failing to strip it
  3. there’s some third encoding (not json or protobuf) that the hash is based on

Can somebody tell me which of the above is more likely? Thanks.

Very strange indeed. ipfs object stat OBJECT and ipfs object get --enc protobuf OBJECT | wc -c gives me the same size, so this rules out 2 and 3.


Some kind of sorting going on?