Is there a way to determine if a string is a valid IPFS multihash?

Let’s say I receive a hash Qm... that’s supposedly an IPFS multihash. How, then, can I verify if this is true, short of running ipfs get or ipfs cat and receiving an error, if it’s not an IPFS hash? Does ipfs have some form of multihash verification routine? (Meaning that it checks, whether (a) there’s an actual IPFS object “waiting” behind that hash, or (b) at least that the multihash has been correctly calculated, and is not just a random string meant to mimic an IPFS multihash.)

EDIT: added (b)

2 Likes

You probably want to use one of these libraries:

1 Like

Good to know. So there’s currently no way to do it with the default go-ipfs ipfs program, correct?

Yeah, I think ipfs ls or ipfs get is your best bet right now.

So, without actually fetching the file, that’s not possible; you can check if the CID/multihash is well formed but not if there exists a corresponding file (existence isn’t a property of any of the hash functions we use except for the identity hash function). One could use snarks instead of hashes to guarantee that the author of the link knew the corresponding data but that would be prohibitively expensive and almost certainly not worth the effort.

1 Like

Right, but a “well-form check” isn’t possible with the default go-ipfs binary, correct?

1 Like

Ah. Not that I know of. You can use something like elcid or write a small go program (depends on your use-case).

elcid example (returns a non-zero status on error):

valid_cid() {
  echo $1 | elcid d QBU3 >/dev/null 2>&1
  return $?
}
2 Likes
I use ipfs object stat, Python code using local go-ipfs install with running daemon:

def ipfs_stat(ipfs_hash):
    '''
    Returns a dictionary of key, integer value pairs for an ipfs object, or False if the stat
    fails.
    '''
    stat = subprocess.getoutput(  'ipfs --api /ip4/127.0.0.1/tcp/5001 object stat '+ipfs_hash )
    if not 'Error:' in stat:
        stat = stat.split('\n')
        stat = [s.split(':') for s in stat]
        dstat = {}
        for s in stat:
            dstat[s[0]] = int(s[1])
        return(dstat)
    else:
        return(False)

Thus, “if ipfs_stat(potential_hash)” can be used as an availability check / validation, and the return dictionary gives useful information like cumulative size which is nice to know. Using ‘ls’ will fail on a file, and ‘get’ without knowing the size of what you’re about to get is generally a bad idea.

1 Like

I had had a look at object stat, too, but the command only returns output immediately, if the file is actually on your node. If it’s a remote object, e.g. QmUkPucZ1WUxwGqR979YAKj2UfUsqpSze6MPDcmhtbzmst, then it still takes very long, and if you’re unlucky, it will take forever, even though it’s a valid IPFS object. With an invalid IPFS object, it also takes very long. (Or even forever?) Would a timeout command suffice? I doubt that, because you could kill stat on valid IPFS objects, and then you’d have a false negative.

1 Like

There is now a standalone utility in go-cid (https://github.com/ipfs/go-cid) cid-fmt that will format a CID in various ways. A simple way to verify a CID using the utility would be:

  cid-fmt prefix QmUkPucZ1WUxwGqR979YAKj2UfUsqpSze6MPDcmhtbzmst

which will print an error if the string is not a valid Multihash or CID.

3 Likes

This seems to work. Thank you. It will output either “input isn’t valid multihash” or “selected coding not supported” on errors.

By the way, wasn’t there already a program calld el-cid?

Yes but I didn’t know about it when I merged cid-fmt. Also, cid-fmt is quite a bit more powerful.

1 Like