How can I (forcibly) overwrite a damaged block in the repo with a good copy?

65A.data: input/output error 2021-12-16T03:52:27.590Z ERROR engine decision/blockstoremanager.go:114 blockstore.Get(QmxxxxxxxxxxxpMYKZu) error: read /data/ipfs/blocks/65/CIxxxxxxxxxxxxKWC2A3XKGTS3IGQAP6AE4RZ65A.data: input/output error

has anybody an idea how I can forcibly overwrite that dead block?

Iā€™ve done a dag export on a node with a good copy and dag import but itā€™s still got the same error

ipfs block rm QmxxxxxpMYKZu
cannot remove QmxxxxpMYKZu: pin check failed: failed to get block for QmxxxxxxxxxxQJb3WiX: read /data/ipfs/blocks/GT/CxxxxxxxxxxxxxxGTA.data: input/output error
Error: some blocks not removed

I really donā€™t want to have to have to re-pin 750+ GB to a fresh repo if I can avoid it

sounds like your harddrive or OS has issues.
Try running FSCK on it, if it doesnā€™t fix it, I guess that a bug we should investigate.

Also can you lookup your dmesg after doing a faily access (more logs that error often hides there) ?

dmsg reports lots of sector errors, So Iā€™ll try the fsck, and then see if that unsticks ipfs or not afterwards.

[12715979.660182] print_req_error: I/O error, dev sdb, sector 3955818496
[12715981.619375] sd 0:0:0:2: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[12715981.619400] sd 0:0:0:2: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 eb c9 00 18 00 00 00 08 00 00
[12715981.619416] print_req_error: I/O error, dev sdb, sector 3955818520
[12715983.530080] sd 0:0:0:2: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[12715983.530082] sd 0:0:0:2: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 eb c9 00 18 00 00 00 08 00 00
[12715983.530084] print_req_error: I/O error, dev sdb, sector 3955818520
[12716134.309203] sd 0:0:0:2: [sdb] tag#1 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[12716134.309206] sd 0:0:0:2: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 eb c9 00 18 00 00 00 08 00 00
[12716134.309207] print_req_error: I/O error, dev sdb, sector 3955818520
[12716137.043444] sd 0:0:0:2: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[12716137.043447] sd 0:0:0:2: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 eb c9 00 18 00 00 00 08 00 00
[12716137.043449] print_req_error: I/O error, dev sdb, sector 3955818520

fsck doesnā€™t seem to find any issues despite the sector errors in dmsg

phill@atlas:~$ blkid
ā€¦
/dev/sdb: UUID=ā€œ30528b33-9391-4302-b3f6-06cac6924b46ā€ TYPE=ā€œext4ā€
phill@atlas:~$ sudo fsck /dev/sdb
fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
/dev/sdb: clean, 6155039/134217728 files, 269229448/536870912 blocks

edit: On a second pass with fsck -f it definitely does have some checksum errors

There is two possibilities:

  • Either your hard shutdown your computer and your FS is corrupted (basically remove the partition and reformat a new one loosing all files in the process). That rare ext4 have a good jounral, the most your should expect is partial or total rollbacks of some modifications.
  • Your drive is dying.

You can check which one using SMART Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki

Thereā€™re also other back end storage options for ipfs gateways such as s3 or nextcloud, if you store it on a storage provider they will do backups so this would never happen

This is one of several nodes with copies of this data.
Iā€™ve also another node attached to a wasabi s3 blockstore.
Thereā€™s no risk of me loosing the data.

The node in question is just an inexpensive replica to provide a point of presence to ensure the data is easily reachable to the network.

All I wanted to know was if there was a way to force ipfs to mark those blocks as invalid and replace them. A patch job on a leaking pipe if you will.

It seems like the underlying filesystem is too fubarā€™ed to repair anything and so Iā€™ll either have to live with it or format the drive.

Maybe just delete that block in the file system, the path of your block is likely has something to do with its path in the ipfsā€™ local repository where it stores all the blocks. On windows itā€™s inside Repo folder in the userā€™s folder.

because after deleting that block, you can get it back from others.

1 Like

If you want ipfs to remove this block itself you can unpin all roots leading to that block and run ipfs repo gc.

A more block targeted feature would be nice.

I was thinking that block rm could be modified to include an arg to just force remove the block from the db.
I tried to unpin the block but if you check my first post it encounters errors and fails on another dead block higher in the merkle tree.

Iā€™ve tried following the advice to manually deleting those blocks from the filesystem, so all that remains are the invalid entries in the datastore db which I have no tools to remove. So any attempt to replace the bad blocks results in the daemon refusing to write new copies because the db claims they still exist. :frowning_face_with_open_mouth:

A low level cli util to manually edit the db would be nice; even if itā€™s not packaged with ipfs itself; as a sort of extra ā€˜I know what Iā€™m doing. No really. This crap is already fubarā€™ed and it canā€™t get worse so donā€™t even bother warning me!ā€™

Or perhaps the blockstore manager could write a field in the db to mark a block as invalid and needs replacing regardless of itā€™s pin status when it encounters errors?
Maybe always forcibly (re)fetch so marked blocks via p2p rather than the local blockstore so you can at least continue ops e.g. unpin and gc?

Or manually via a command? ipfs block mark invalid?

Hello, this look like a hardware failure. ipfs block rm is your way of forcefully removing a block. But if the disk is broken, you cannot write or read to it and IPFS can do little about it.

What you call ā€œdbā€ is literally everything in the blocks/ folder, which are just files. This is not solvable by anything that IPFS can do because the disk is broken.

I am not sure what ā€œdbā€ you want to manually edit, because other than the pinset, which has the pin roots, there is no such db. IPFS cannot ā€œreplace blocksā€ if the disk refuses to read-write things. If you manually remove the file and restart the ipfs daemon, then any operation on that block (ipfs block stat) will re-fetch it (and if you use --offline it will complain that it is not found). And for all I know, IPFS may fail to write it again because the disk is borked.

The solution is to fix your disk.

2 Likes

So slightly on a different tack but related, Iā€™ve now managed to have a bad block in my s3 api backed blockstore.
Iā€™ve tried to find exactly whatā€™s the object key for the underlying dead object so I can replace it.
Unlike with a flat file blockstore on local disk I canā€™t seem to figure out what debug options to enable to monitor the calls to the s3 api.
Iā€™ve enabled debug logging for pin, blockstore & blockservice but none of those show anything more than the v0 cid of the dead block, which I already know.
Is there any way to work out what the object name is in s3 from the cid?