[SOLVED] File size comparison for file synchronization

I’m currently working on a program synchronizing local files to IPFS and I have trouble getting the same file size from IPFS files as from local files.

Here in this list of examples I’m adding increasingly bigger files to IPFS and each time the file difference (local size vs IPFS size) increases progressively.

# Raw data file size of 0b
$ ipfs object stat $(dd if=/dev/random count=0 bs=1 status=none | ipfs add --pin=false -Q)

NumLinks: 0
BlockSize: 6
LinksSize: 2
DataSize: 4
CumulativeSize: 6
# Diff size => 6b

# Raw data file size of 10b
$ ipfs object stat $(dd if=/dev/random count=10 bs=1 status=none | ipfs add --pin=false -Q)

NumLinks: 0
BlockSize: 18
LinksSize: 2
DataSize: 16
CumulativeSize: 18
# Diff size => 8b

# Raw data file size of 1kb
$ ipfs object stat $(dd if=/dev/random count=1000 bs=1 status=none | ipfs add --pin=false -Q)

NumLinks: 0
BlockSize: 1011
LinksSize: 3
DataSize: 1008
CumulativeSize: 1011
# Diff size => 11b

# Raw data file size of 100kb
$ ipfs object stat $(dd if=/dev/random count=100000 bs=1 status=none | ipfs add --pin=false -Q)

NumLinks: 0
BlockSize: 100014
LinksSize: 4
DataSize: 100010
CumulativeSize: 100014
# Diff size => 14b

# Raw data file size of  1mb
$ ipfs object stat $(dd if=/dev/random count=1000000 bs=1 status=none | ipfs add --pin=false -Q)

NumLinks: 4
BlockSize: 200
LinksSize: 178
DataSize: 22
CumulativeSize: 1000256
# Diff size => 256b

I understand that for the last example, the file chunker come into play and further increases the size difference but I don’t see where the size difference comes from in the other examples.

Is there any formula I can use to convert the CumulativeSize of IPFS object to get the original local file size?

Ok I’ve coded a formula to convert IPFS object size to actual file size. So far it works pretty good.

Guess I’ll have to wait for https://github.com/ipfs/unixfs-v2 to have actual file size available from the IPFS API

Here is my formula if anyone needs it:

const IpfsMaxChunkSize = int64(262144)


// Convert IPFS object cumulative size to actual file size
// Only for small file of size < 262267
func convertSmallFileSize(ipfsCumulativeSize int64) int64 {
	switch {
	case ipfsCumulativeSize == 0:
		return 0
	case ipfsCumulativeSize < 9:
		return ipfsCumulativeSize - 6
	case ipfsCumulativeSize < 131:
		return ipfsCumulativeSize - 8
	case ipfsCumulativeSize < 139:
		return ipfsCumulativeSize - 9
	case ipfsCumulativeSize < 16388:
		return ipfsCumulativeSize - 11
	case ipfsCumulativeSize < 16398:
		return ipfsCumulativeSize - 12
	default:
		return ipfsCumulativeSize - 14
	}
}

// Convert IPFS object size to actual file size
func convertToFileSize(objectStat api.ObjectStat) int64 {
	// Calculate file size
	var fileSize int64
	cumulativeSize := objectStat.CumulativeSize
	if cumulativeSize < (IpfsMaxChunkSize + 123) {
		// Single chunk file
		fileSize = convertSmallFileSize(cumulativeSize)
	} else {
		// Multiple chunk file
		i := cumulativeSize - objectStat.BlockSize
		maxSizeChunks := i / (IpfsMaxChunkSize + 14)
		remainingSizeChunk := i % (IpfsMaxChunkSize + 14)
		fileSize = i - (maxSizeChunks * 14) - (remainingSizeChunk - convertSmallFileSize(remainingSizeChunk))
	}
	return fileSize
}

Alternatively you could catch the X-Content-Length http header from /api/v0/cat.
I’m not sure if your node need to fetch the entire object for this to work tough.

func ipfsSize(hash string) int64 {  
          cl := http.Client{}

          res, err := cl.Get(fmt.Sprintf(ipfsAPI+"cat?arg=%s", hash))
          // Handle error

          res.Body.Close()
  
          sizeStr := res.Header.Get("X-Content-Length")
  
          size, err := strconv.ParseInt(sizeStr, 10, 64)
          // Handle error

          return size
  }

It looks very depressing