Meaning of Kubo data store configuration with 2 mounts

My /.ipfs/datastore_spec file looks as follows

{
  "mounts": [
    {
      "mountpoint": "/blocks",
      "path": "blocks",
      "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
      "type": "flatfs"
    },
    {
      "mountpoint": "/",
      "path": "datastore",
      "type": "levelds"
    }
  ],
  "type": "mount"
}

What is the meaning of the two mounts?

What is the relationship between levelsds and flatfs?

Is this documented anywhere?

The datastore docs kubo/datastores.md at master · ipfs/kubo · GitHub are mostly concerned with the individual stores but not with how it works with multiple mounts.

Kubo has two internal DB types:

  1. Blockstore, this store blocks, it is what flatfs implements
  2. Datastore, this is a normal KV store, it is what levelds implements.

We have some wrapper that implements blockstore over datastore, this is used when you use badger (badger implements datastore and we use the wrapper to store blocks through badger’s datastore).
In theory you could use leveldb in the same way but if it’s not being done I guess that perfs aren’t that great.

Note: we can’t just rely on a blockstore because we need something else to store various metadata (what you have pinned, the MFS root, …)

1 Like

So if I understand it correctly:

  1. Blockstore stores raw blocks
  2. Datastore is used to store metadata (what you have pinned, MFS root)

What is the main difference between them? Does the datastore store things in a structured way that allows fast querying


Since Badger, by default, implements the Datastore interface, you use this wrapper so you can use it as a blockstore.

This begs the question: when using Badger, is it used as both a blockstore and datastore?

Datastore is used to store metadata (what you have pinned, MFS root)

datastore does not store metadata, it stores anything that KV and we could need it for.

What is the main difference between them? Does the datastore store things in a structured way that allows fast querying

No not really, we could but we don’t.

This is mainly a type difference and helps make the code clearer, it’s mostly looking like this (pseudo code):

type Blockstore interface{
  Get(m mh.Multihash) ([]byte, error)
  Put(m mh.Multihash, b []byte) error
}

type Datastore interface{
  Get(k string) ([]byte, error)
  Put(k string, b []byte) error
}

type DatastoreToBlockstoreWrapper struct{
  d Datastore
  preffix string
  hashOnRead bool
}

func (w DatastoreToBlockstoreWrapper) Get(m mh.Multihash) ([]byte, error) {
  data, err := w.Datastore.Get(w.mhToKey(m))
  if err != nil {
    return nil, err
  }
  if w.hashOnRead {
    sum := m.Header().Sum(data)
    if sum != m {
      return nil, errors.New("red data doesn't match expected hash")
    }
  }
  return data, nil
}

func (w DatastoreToBlockstoreWrapper) Put(m mh.Multihash, b []byte) error {
  return w.Datastore.Put(w.mhToKey(m), b)
}

func (w DatastoreToBlockstoreWrapper) mhToKey(m mh.Multihash) string {
  return "/blockstore/" + w.pref + "/" + string(m)
}

One of them is a bytes to bytes mapping (datastore), one of them is multihash to bytes (blockstore).

This begs the question: when using Badger, is it used as both a blockstore and datastore?

Yes but not really.
Yes in practice it’s true, however badger does not know that, badger is just mapping keys to bytes “dumbly”.

Here are the actual interfaces:

1 Like

What is the main difference between them? Does the datastore store things in
a structured way that allows fast querying

The block store is just a Unix directory tree. It’s sharded, so you
don’t have a million files in one single directory or whatever.

The data store is a key-value store. The keys are structured. Think
on-disk dictionary. For instance, the CID of your local MFS root is
stored in the datastore under the key /local/filesroot. It’s fast.

1 Like

this is an over generalisation assuming that all blockstores are flatfs.
It’s how flatfs works, but not badgerdb.

Or whatever future sparse based blockstore I want to write.

1 Like