How to know what type of file a CID represents?

Hello ipfs frens!

I am looking to render IPFS CID’s on my client-side app. However, how do I know if I am dealing with an image or a video or any other sort of file or directory that could be stored as a CID?

Q: Is there any way to see what sort of data a CID represents from the CID or bytes data?

1 Like

I’ll take a stab at answering this with an example.

Let’s say you have the following CID of an image: bafybeibml5uieyxa5tufngvg7fgwbkwvlsuntwbxgtskoqynbt7wlchmfm

If you load the CID in the CID inspector you can see the multicodec is dag-pb which is a MerkleDAG protobuf. This just helps decode the block that this CID points to, but won’t tell you much about the file.

If the CID doesn’t contain the information, maybe the root block does?

The answer is no. For more information, check out the UnixFS Data Format.

Below is a screenshot from the IPLD explorer for this CID

The root block just has information on whether it’s a file or folder in addition to links to the CIDs of the blocks that contain the images bytes.

What about looking at the file extension?

That’s a nice idea. The first problem is that it’s only relevant for files that are wrapped in a directory in UnixFS (see for more info Getting the file names of files I pinned locally - #2 by danieln).

The second problem is that it’s not always reliable and not all files have an extension.

Ok, so how can I find out what kind of file this is?

The most reliable way is to peek at the beginning of the file and decipher the file signature also known as “magic bytes”.

In fact, this is what Kubo does when you make a gateway request in order to set the content-type: image/jpeg header when you load the image.

Can I see an example?

This uses GitHub - LarsKoelpin/magic-bytes: A library for detecting file types. to determine the type of the file.