Client-side DAG CID calculation (javascript)

On the js ipfs implementation, I can store a dag object with:

const obj = { simple: 'object' }

ipfs.dag.put(obj, { format: 'dag-cbor', hashAlg: 'sha3-512' }, (err, cid) => {
  console.log(cid.toBaseEncodedString())
  // zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv
})

I believe this shoots a message from my client node to the IPFS service, where the CID is then generated as the data is stored, and I get the CID in the callback function. I would like to see what this looks like using pre-computed hashes, bypassing the network for generating those CIDs until I want to store an object.

I’ve got the SHA3 npm library, but I don’t think it’s as simple as running a JSON.stringify(obj) through SHA3Hash(512) then into bs58.

const bs58 = require('bs58')
const SHA3 = require('sha3');
const obj = { simple: 'object' }
const dig = new SHA3.SHA3Hash(512)
  .update(JSON.stringify(obj))
  .digest('hex')
console.log(bs58.encode(Buffer.from(dig, 'hex')))
// 42iHiPUmns7ZhEovnNFy8wpN8S9c6XUQoj9oB56sqRJqfyQuqUnibcL2XMoKxaCq2U5hAbXsiogdBfXXgDJ8KLT8

I was looking at the multihashes module, but this wasn’t quite right:

const bs58 = require('bs58')
const multihash = require('multihashes')
const obj = { simple: 'object' }
const buf = new Buffer(JSON.stringify(obj))
const encoded = multihash.encode(buf, 'sha3-512')
console.log(bs58.encode(encoded))
// 2Ebtp43hDzyUmL5pb4Pe3n2dhRrEg

Also, I was looking at the dagCBOR / dagPB modules… but this doesn’t look quite right either.

const bs58 = require('bs58')
const dagCBOR = require('ipld-dag-cbor')
const dagPB = require('ipld-dag-pb')
const obj = { simple: 'object' }

dagCBOR.util.serialize(obj, (err, dagCbor) => {
  dagPB.DAGNode.create(dagCbor, [], 'sha3-512', (err, res) => {
	if (err) throw error
	console.log(bs58.encode(res.multihash))
    // 8tVWpxRmq6DbntaNX9GDtE3zv1ySh3TwZ9JvEqcvTr2GrJW2dimdeHSQ9jkiYLWtLXapBrwPtRJFnixyAtkBh8Cx6P
  })
})

Which library might I be able to use to generate this same hash (zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv) that ipfs.dag.put provides for an offline object?

I couldn’t quite make out from the documentation whether a dag object is essentially a JSON object with a user-defined structure (eg. { simple: 'object', random: 'zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv' }), or if its some sort of fixed structure with a Data buffer, Links array (name, size, multihash), Size integer, and b58-encoded Multihash. If I wanted that random key to link to another object, would that usage suffice, or is there some specific Dag Linking method?

Totally agree that this should be a function:

We aren’t encoding dag’s but for raw bytes, in our transport library, we use
multihashes.toB58String(multihashes.encode(KeyPair.sha256(data), ‘sha2-256’));

Where sha256(data) is defined as
crypto.createHash(‘sha256’).update((data instanceof Buffer) ? data : new Buffer(data)).digest()

(we have previously required crypto and multihashes )

Hmm, experimenting with hashing a string, I’m seeing:

echo 'test' | ipfs add -q
// QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH

While in javascript…

const hash = crypto.createHash('sha256').update('test').digest()
const encoded = multihash.encode(hash, 'sha2-256')
const enc58 = multihash.toB58String(encoded)
console.log(enc58)
// QmZ5NmGeStdit7tV6gdak1F8FyZhPsfA843YS9f2ywKH6w

Tricky.

I’m not quite sure how ipfs dag put stores a JSON object inside a data buffer – I’m guessing this is what dagCBOR.util.serialize might be used for rather than something like Buffer.from(JSON.stringify(obj))… although I’m not sure if dagPB.DAGNode.create() is necessary? Experimenting further, I’ve tried:

dagCBOR.util.serialize(obj, (err, dataBuf) => {
  if (err) throw error

  const hash = new SHA3.SHA3Hash(512)
    .update(dataBuf)
    .digest('hex')

  const encoded = multihash.encode(Buffer.from(hash, 'hex'), 'sha3-512')
  const enc58 = multihash.toB58String(encoded)	
  console.log(enc58)
  // 8tYHdyGsFY1HYZTogmhSsW4pHbZ7JtZLyjWhYiszc9vpcqMszpELPLV4yjC47xvFHDxjTLSa7imdYsQWaKYEaWePAd
})

And…

dagCBOR.util.serialize(obj, (err, dataBuf) => {
  if (err) throw error
  const hash = crypto.createHash('sha256').update(dataBuf).digest()
  const encoded = multihash.encode(Buffer.from(hash, 'hex'), 'sha3-512')
  const enc58 = multihash.toB58String(encoded)	
  console.log(enc58)
  // TPhhPnUgWmjAu5UivQtR9ju7PDvMv3SxGE4u6oyF833YnL
})

I’m missing some key detail on how these two hashes (zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv & QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH) are computed.

Try
echo 'test' | od -c
and you’ll see it has a trailing \n. So your two examples aren’t the same.

I’m not following the rest of your code, partly because I’ve no used dagCBOR, but your last two examples use different hashes (one uses SHA3-512, while 2nd uses a mix of sha256 and sha3-512, which may just be a typo?

Aha! I didn’t notice that \n. \o/

const buf = new Buffer('test')
ipfs.block.put(buf, (err, block) => {
  if (err) { throw err }
  console.log(block.cid.toBaseEncodedString())
  // QmZ5NmGeStdit7tV6gdak1F8FyZhPsfA843YS9f2ywKH6w
})

/* Client-side hash calc for data blob */
const hash = crypto.createHash('sha256').update('test').digest()
const encoded = multihash.encode(hash, 'sha2-256')
const enc58 = multihash.toB58String(encoded)
console.log(enc58)
// QmZ5NmGeStdit7tV6gdak1F8FyZhPsfA843YS9f2ywKH6w

ipfs.dag.put(obj, { format: 'dag-cbor', hashAlg: 'sha2-256' }, (err, cid) => {
  console.log(cid.toBaseEncodedString())
  // zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv
})

/* Client-side DAG CID calc for object */
dagCBOR.util.serialize(obj, (err, buf) => {
  const hash = crypto.createHash('sha256').update(buf).digest()
  const encoded = multihash.encode(hash, 'sha2-256')
  const cid = new CID(1, 'dag-cbor', encoded)
  console.log(cid.toBaseEncodedString())
  // zdpuAzE1oAAMpsfdoexcJv6PmL9UhE8nddUYGU32R98tzV5fv
})

I noticed that when I use hashAlg: 'sha3-512' or hashAlg: 'sha2-256' I’m getting the same hash for both – I’m wondering if the hashAlg specifier or sha3-512 option hasn’t been implemented yet (even though it’s used in the example from the documentation…) ?