I’ll take a stab at answering this with an example.
Let’s say you have the following CID of an image: bafybeibml5uieyxa5tufngvg7fgwbkwvlsuntwbxgtskoqynbt7wlchmfm
If you load the CID in the CID inspector you can see the multicodec is dag-pb
which is a MerkleDAG protobuf
. This just helps decode the block that this CID points to, but won’t tell you much about the file.
If the CID doesn’t contain the information, maybe the root block does?
The answer is no. For more information, check out the UnixFS Data Format.
Below is a screenshot from the IPLD explorer for this CID
The root block just has information on whether it’s a file or folder in addition to links to the CIDs of the blocks that contain the images bytes.
What about looking at the file extension?
That’s a nice idea. The first problem is that it’s only relevant for files that are wrapped in a directory in UnixFS (see for more info Getting the file names of files I pinned locally - #2 by danieln).
The second problem is that it’s not always reliable and not all files have an extension.
Ok, so how can I find out what kind of file this is?
The most reliable way is to peek at the beginning of the file and decipher the file signature also known as “magic bytes”.
In fact, this is what Kubo does when you make a gateway request in order to set the content-type: image/jpeg
header when you load the image.
Can I see an example?
This uses GitHub - LarsKoelpin/magic-bytes: A library for detecting file types. to determine the type of the file.