My server has crashed for Nth time in a couple of months, so I decided to do a
badblocks
test. I have usedfsck
to detect and markbadblocks
, and it has indeed detected some. If I am correct, this means the filesystem will not use those blocks anymore to store data.But, what happens to the data which was already there? Has it been moved? It was probably corrupted to begin with, so probably the files which were using those blocks are broken. Now I have several open questions:
- can I detect which files have been affected?
- how can I check if those files are corrupted or not after
fsck
?- is there any way to tell my distribution (Ubuntu 14.04) to “reinstall all packages, as they are cached in the system”? (that is, no upgrades, just re-installation of the current versions, without overwriting any configuration files)
Note: for completeness I paste here a result of the
fsck
:root@rescue:~# fsck -vcck /dev/sda2 fsck from util-linux 2.20.1 e2fsck 1.42.5 (29-Jul-2012) Checking for bad blocks (non-destructive read-write test) Testing with random pattern: done /dev/sda2: Updating bad block inode. Pass 1: Checking inodes, blocks, and sizes Running additional passes to resolve blocks claimed by more than one inode... Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 8: 119060233 119060234 119060592 119060615 119060616 119060617 119060618 119060619 119060620 119060621 119060623 119060624 119060625 119060626 119060632 119060633 119060635 119060636 119060637 119060638 119060639 119061755 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 0 inodes containing multiply-claimed blocks.) File <The journal inode> (inode #8, mod time Mon May 5 14:17:18 2014) has 22 multiply-claimed block(s), shared with 1 file(s): <The bad blocks inode> (inode #1, mod time Thu Aug 7 19:11:37 2014) Clone multiply-claimed blocks<y>? yes Error reading block 119060233 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060234 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060592 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060615 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060616 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060617 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060618 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060619 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060620 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060621 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060623 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060624 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060625 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060626 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060632 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060633 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060635 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060636 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060637 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060638 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119060639 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Error reading block 119061755 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes Force rewrite<y>? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (23499, counted=23477). Fix<y>? yes Free blocks count wrong for group #2016 (23956, counted=23961). Fix<y>? yes Free blocks count wrong for group #3633 (65514, counted=0). Fix<y>? yes Free blocks count wrong (231534163, counted=231534168). Fix<y>? yes /dev/sda2: ***** FILE SYSTEM WAS MODIFIED ***** 154609 inodes used (0.26%, out of 59736064) 47 non-contiguous files (0.0%) 9 non-contiguous directories (0.0%) # of inodes with ind/dind/tind blocks: 0/0/0 Extent depth histogram: 154209/10 7404456 blocks used (3.10%, out of 238938624) 99 bad blocks 2 large files 126167 regular files 27996 directories 0 character device files 0 block device files 0 fifos 0 links 437 symbolic links (382 fast symbolic links) 0 sockets ------------ 154600 files
Answer
First, take a look at the Bad Block HOWTO for smartmontools:
https://www.smartmontools.org/wiki/BadBlockHowto
Second, if you don’t already have it, time to implement a working backup strategy.
If you need a certain availability of your server, you might also want to consider implementing a RAID-1, mirroring.
And either way it’s time to get rid of the old hard disk drive and get a new one. It failed you already enough times in the past, so it is quite certain this is not going to be better in the near future and beyond.
Attribution
Source : Link , Question Author : blueFast , Answer Author : Andy Forceno