How to decipher root node content?

To better understand how mid and root nodes are encoded in IPFS, I decided to see if I can decode them. I’m trying to find the children’s hashes from the raw data of the root node. My root node CID is QmRvnF6ybLDfAHzuKFZmPcQKMyaWXCga4GtTVVE3nheQdh.

I ran:

ipfs block get QmRvnF6ybLDfAHzuKFZmPcQKMyaWXCga4GtTVVE3nheQdh >> root.bin

protoc --decode_raw < root.bin 

protoc is the Protocol Buffer compiler, which is apparently what v0 nodes are encoded using. Here is the output:

2 {
  1 {
    2: "\363\327\366?\027v\335\374\301\035\036\266\266#7\nb\207\242\336\337~\316\272\334\325!\260E\226\271\006"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\372\303K\362\255E\275\272e\300\225p\314\004z\320\305\336\377\321\n\207\0106k\032\205\217\324\246\213\310"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\032\336(\032\363@\331\2060&\332\225\351\272~\316\035\024\272>\nH,\252\0001\356{\031\215\021\365"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\364$\204\317\214\340\332;\336\026\370\361\177\345\177C<\366~\274\031\026l\220\345\350\'U|\255g/"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\031_\030\322\343\277\005\370;tX\236-|\315\223\002x\251\031F\236\252;PmD\0069\244\322`"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\360\353055`\360D\003M\214W\010g7\376;\367\206\362\r\007\210\331,\027\344\242\231X\255\321"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\013\303\031\202^\343\203\261\022?\030g\313\023\370\355\351\275\275\261\343\245\325\243pHB\333=\370\301="
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\367L\235h\213\345\207j\354\"\273\366\244\363L\303\377?,\031\037\231\t\021\304.u\202\250\343R\377"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "MI0\327+\304\362\357-_\214\335\016\214?\033}\n\245\340\257_H\3303o\344\256I\273aY"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "sp\210\323\247\275s\2216\206\030\254\370/U\331\004\300\375\356\257(\366p\353\267\256\355_*H\205"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\353\261fq\302\336\3015\337\020\310 =\023\023\014\226=\032f\222\336q)\220P\030\266\311\246\017\216"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\233\253\021uM\3158\315?$\376\260.\331\242{Z\333\260<\274\357=bL\340!-\225`F\341"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "J\005\336|\336\316i\206^\033N\314\202\220}\3627\233\351Xvt\243s>\365!\006iU\217\330"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\247\022Gz\003gp\224.\037r;9J\0146q\346\2778\010\322\2000\376c\323_}\251\001\250"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\201\311\276]6\n\275=yh\216\"\000\237\367\307}\275RH:\315\370b\002\3667;\331d\306\004"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\363\'\333Y\t\262\365\347ZW\353c\357\306\"\364a#\207F\206.8d<\340\004<\013o\3639"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\302;L\315hu\351\370\223\342\314\212NkT\204\330\213\333$*\354\035\016\031\203s\214o\210_\365"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\307e\204\343\325,\243~\207v\220F1\336\242\350\303\014;{p\262\022v+e:\020b\326K\355"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\337p\354\271\351C\376\021\016\236#\227/\225\341W7Qj(\371\317j\002G\325\213\345\352\255\036\242"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "5\277\177*0Y\261\274\351oU\237~\014\325R\037N\225\270\323\225C\223\357\002\006\370n\254\225\266"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\311}Y\341\023\354gM\311*pH\020\245\237\231\342\211\273/scg\270Q\307V\352\274\335\013\327"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\353{+\222\213\364q\224R\001\226\030\241\262\361\026\334\206\333q\272!^\037\327G\331\300\341\332g\221"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\273pJ\334D`\033399\274_7\323\371\002\360\'\213\311\327\000\335\031\367\237x>\002E\027\250"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "O\363\216\234\245\267:\000\322^\213/\021\300\276;\377\321\301},{\342e\220\2701\347\223\264w\033"
  }
  2: ""
  3: 262158
}
2 {
  1 {
    2: "\246J\350\353k\374J\356\222\3119\233\273\351\033o\325nU\204@\345\355\256\235\360\273T4\240n\202"
  }
  2: ""
  3: 197224
}
1 {
  1: 2
  3: 6488666
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 262144
  4: 197210
}

I know this has to somewhat meaningful since the bottom numbers preceded by “4:” match sizes of the leaf nodes:

ipfs ls QmRvnF6ybLDfAHzuKFZmPcQKMyaWXCga4GtTVVE3nheQdh

QmekWn55wAVow5kZhkLN8GhnSMBKK7FiFKEsXYLyW8sMZP 262144 
QmfDXMZdebJKb8BdjfDoFWRxxY1MphWJncaNVj76QwuR6f 262144 
QmQ9XnYtK9D8YGZw2ci8Ead1U2dyZbYwFbA9eaf8fqua8U 262144 
QmemgV6TGgXRQ5eD4p5tuZjJSWJMc1cbTp8S2N2TVhE67Q 262144 
QmQ3h1KBhxLKRRg6PbkACs2nvdphmSHSfV6ifFeTUdqZrf 262144 
QmeZ6ZMqiHpTiED5pfepGTK5dnoMvPcm5vwEuBodSuw7xt 262144 
QmP8Zjx5nTTg1MuekuCT2kNgs5QRYtNVivLCSrPKknHPik 262144 
Qmez1AnMHP9n5o7ERqJcWr26r7LpvdFrz3TNozV9N8BSBQ 262144 
QmTYLqF174Yt7qomu4vUoRAqC94eoHgdCP5E7E6p72CYKa 262144 
QmW7H7xmt6EtgB1BYiLyJYP3va5dtah86ZtFFyezkBEKWc 262144 
QmeChQre44VEedqCMFHR8iXjm5nVS1GvHadsuWKphZtEv1 262144 
QmYpKCLWRLgkrNECuRvnTA9P9WNUozopNMtFftPgtZWJUx 262144 
QmTKc519rpAbCHwLPtABfCCg32MtrcrRS7TrcUPzSRrUzT 262144 
QmZapxi25Ys3kqjpwtVaaz6MwLvb6cPfnyMKa12VgknV1Z 262144 
QmX5Hj3Es5PkL84qv8pPTEW5W3RKgb28MFPwc3SipZQVFq 262144 
Qmehq2g5PyeBsr1RaWwm6E11pRSFd1Lv2oXyxr7YXyk7HA 262144 
QmbQrFLyF3Mgd5AmjhUzJH4VJeiBxWyHFfb351b3kSNTqv 262144 
Qmbm1dE6kGvmoyryaZ7j42gAif7m2BfJTzFafVQAyWHrm2 262144 
QmdNsVoFwpoJJq1LVQzoN66JRWADR836eNZxD2Njkvds8u 262144 
QmRxTgHjW3pAP3rKw3URpck4NTn4xumKxovMgXadRzULDB 262144 
QmbuBWzZbFZHDH2LyKoLohB6jpUZs6zp8TV2i4hJjphB4J 262144 
QmeBsT6xszZ8LTjn6KnTaKJxr8PB6hD4fPvS6VFA6beoSg 262144 
QmaxLG6HQ5w1hEqUeNmrGpbt2KqPtgrLDC76z33DGWEr3R 262144 
QmTikKZLjuoLQjCnndPaMMfuoBTWVPoNApNFat5JuACZaS 262144 
QmZXndxX8zUpg9Pk99ERB8anFn15v1wKirNLwUMkbdE4Ho 197210 

I’m trying to figure out how to extract the children’s hashes from the root node. Can anyone help me figure out how exactly I can do this? Thanks!

You are looking at a merkledag-pb type of node.

In field 1 it contains a protobuf for a unixfs node.

In fields “2” it contains links: CID bytes, name and size. In order to obtain the CID from the bytes you would need to encode them. I suspect they are CIDv0/Multihashes so probably encoding them in base58 gives you the Qm... strings you are looking more.

In this case you are showing a File (unixfs type 2), with has a total size of 6488666 bytes, and has been chunked, so it has information about the total filesizes in each of the child branches, which are 262144 bytes/256K (except the last, which is smaller). This should be the cumulative size of the Data fields in the unixfs protobuf of the all the children following that specific link (in your case only 1 child).

What is interesting is that the 262144 bytes values that you are seeing in ipfs ls as link-sizes do not come from this protobuf. Instead, each of the links is fecthed, deseralized and the filesize field of the embedded unixfs object (field 3) in the child is used. Using ipfs ls --size=false disables this, which makes ls faster but does not show sizes.

Why is ipfs ls not using the blocksizes fields though?

I think in this case it could because pin ls is operating on a unixfs File with links to chunks and carries the cumulative data sizes. However, it does not work with Folders since they do not have these blocksizes, and ipfs ls is usually used on folders.

blocksizes main usage is for seeking on a file, that is being able to follow the right link if we want to read a byte at position N, therefore they are only present in Files, not folders.

What about the Tsize fields (3) in the Links, which are defined as cumulative size of target object

These are slightly larger than blocksizes because it is no longer the size of the unixfs-pb data fields of the children but the total raw size of the linked block (including the merkledag-pb/unix-fs metadata etc). Therefore if it was used it would not be the real file size that the user is expecting to see. However it can give information about the total size of a dag as stored in IPFS.

This all could be done better I guess. ipfs ls used to not fetch children, but it gave sizes which were not expected. The UnixFS-pb/Merkledag-pb combo is suboptimal for some things and there’s always a question of which layer is the right one to carry certain information, and what information should that be. There was some discussion about UnixFSv2 but I have lost track.

1 Like

Wow! It is so kind of you to answer not only my original question, but also to anticipate many interesting offshoot questions!

What is the encoding scheme that you all plan on default using with v1? Is it dag-cbor? Thanks a ton again!

Unixfsv2 you mean? Probably dag-cbor, but not sure.

1 Like