Adding a directory with a lot of files

ligi · July 11, 2021, 2:07pm

I added some content - a directory via ipfs add -r <path> and it worked:

...
added Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNog output
 8.26 MiB / 8.26 MiB [==============================================================================================================================] 100.00%

and took about 30min

then I tried to pin it on my dappnode (added it on my laptop which is not 100% online) and it was just hanging:

And as I do not get any progress there I tried:

ipfs pin add QmZvCJBNKdKMohHE5u18vNgK6pA3RS5CkWu82M7HWZ84pA --progress

even on the same machine where I did the successful ipfs add to eliminate connection problems.
And it is stuck on Fetched/Processed 0 nodes for days.
The problem is that one subdirectory contains a lot of (small) files (272745)
Is there anything I can do or is it just not possible with IPFS?

if you want to reproduce it - I was trying to pin the directory output that is created in the build-step of GitHub - ethereum-lists/website: The source for the site lists.eth
But guess a simpler repo might be to just use GitHub - ethereum-lists/4bytes: List of 4byte identifiers for EVM smart contract functions directly as I am pretty sure this is the culprit (cannot test now as I am now on 3g and limited data)

adin · July 11, 2021, 5:58pm

Try ipfs block stat <cid> if the result is larger than 1MB then you won’t be able to transfer it over the network (the actual limit is a bit higher than 1MB, but that’s all that’s guaranteed to work at the moment).

The way to get around this is by using UnixFS sharded directories. Unfortunately, they’re not enabled by default at the moment and have some tradeoffs go-ipfs/experimental-features.md at master · ipfs/go-ipfs · GitHub.

Work on automatic sharding is being tracked Tracking issue for UnixFS automatic sharding · Issue #8106 · ipfs/go-ipfs · GitHub and is currently slated for go-ipfs v0.10.0

ligi · July 11, 2021, 7:58pm

Thanks @adin - the hint with the sharding feature is most helpful - will try this out.
But not sure if the result is larger than 1MB - not sure about the unit here:


igi@komputing:~$ ipfs block stat Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNg
Key: Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNog
Size: 208

adin · July 11, 2021, 11:00pm

The units are bytes (as described in ipfs block stat --help), but you did ipfs block stat Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNg while asking about QmZvCJBNKdKMohHE5u18vNgK6pA3RS5CkWu82M7HWZ84pA.

wclayf · July 11, 2021, 11:12pm

To be honest any time you’re attempting a solution that involves 100s of thousands of files in a single folder, you should expect that to fail (or have unusably slow performance) on most file system that exist. Linux/Windows, etc. It’s just a massive anti-pattern. i.e. trying to use a ‘folder’ as a “blob database”. Never works, at scale.

If you need that much data stored, just add each blob/file as a separate thing and worry about storing all their CIDs (the index of them) as a completely separate task. That’s just my advise, people may disagree.

Topic		Replies	Views
Pin add <hash> is very slow? Kubo go-ipfs	4	1289	September 13, 2018
Adding folder with 49999 files and +- 500 mb stuck Help	9	206	July 1, 2023
IPFS Add Pin not completing operation Help	3	645	December 7, 2017
How to add a directory node without adding all its contents Kubo	12	776	November 6, 2021
Issues with concurrent pinning/adding of content to IPFS through a single node with latest go-ipfs version 0.4.17 Help go-ipfs	2	660	August 14, 2018

Adding a directory with a lot of files

Related topics