How to know what type of file a CID represents?

DeveloperAlly · February 20, 2023, 1:43pm

Hello ipfs frens!

I am looking to render IPFS CID’s on my client-side app. However, how do I know if I am dealing with an image or a video or any other sort of file or directory that could be stored as a CID?

Q: Is there any way to see what sort of data a CID represents from the CID or bytes data?

danieln · February 20, 2023, 2:09pm

I’ll take a stab at answering this with an example.

Let’s say you have the following CID of an image: bafybeibml5uieyxa5tufngvg7fgwbkwvlsuntwbxgtskoqynbt7wlchmfm

If you load the CID in the CID inspector you can see the multicodec is dag-pb which is a MerkleDAG protobuf. This just helps decode the block that this CID points to, but won’t tell you much about the file.

If the CID doesn’t contain the information, maybe the root block does?

The answer is no. For more information, check out the UnixFS Data Format.

Below is a screenshot from the IPLD explorer for this CID

The root block just has information on whether it’s a file or folder in addition to links to the CIDs of the blocks that contain the images bytes.

What about looking at the file extension?

That’s a nice idea. The first problem is that it’s only relevant for files that are wrapped in a directory in UnixFS (see for more info Getting the file names of files I pinned locally - #2 by danieln).

The second problem is that it’s not always reliable and not all files have an extension.

Ok, so how can I find out what kind of file this is?

The most reliable way is to peek at the beginning of the file and decipher the file signature also known as “magic bytes”.

In fact, this is what Kubo does when you make a gateway request in order to set the content-type: image/jpeg header when you load the image.

Can I see an example?

github.com

2color/verified-ipfs-retrieval/blob/main/src/components/VerifiedImage.tsx#L45


      
            const file = files[0]
            const bytes = new Uint8Array(file.size)
            let offset = 0
          
            for await (const chunk of file.content()) {
              bytes.set(chunk, offset)
              offset += chunk.length
            }
          
            // TODO: detect the file type to set the mime type using magic bytes or extension
            const fileType = filetype(bytes)
          
            fileType[0].extension && setFileExt(fileType[0].extension)
            fileType[0].mime && setFileMime(fileType[0].mime)
          
            const fileBlob = new Blob([bytes], { type: fileType[0]?.mime })
            const objectURL = URL.createObjectURL(fileBlob)
            setDataURL(objectURL)
          } else {
            // console.error('only single file CARs are supported')
          }

This uses GitHub - LarsKoelpin/magic-bytes: A library for detecting file types. to determine the type of the file.

Topic		Replies	Views
Sneak peek at a resource without downloading Help go-ipfs , files	2	962	October 23, 2018
File Format Identification Post Generation of HASH Ecosystem and Usage go-ipfs , use-cases-and-apps	4	1259	April 29, 2020
Finding the DataType of a dag-pb block via the HTTP API Help	6	669	October 28, 2020
Identify Content-Type from CID Help	3	164	June 8, 2024
Ipfs How to know the type and name of the file by hash code Kubo go-ipfs , ipfs-cluster	3	2504	July 2, 2018

How to know what type of file a CID represents?

If the CID doesn’t contain the information, maybe the root block does?

What about looking at the file extension?

Ok, so how can I find out what kind of file this is?

Can I see an example?

Related topics