LVM Bugs

The Linux kernel shat all over my server this morning.  I went to reboot it to clear a write-lock on the block device for the /usr filesystem and when it booted kernel version 2.6.18-194.11.1.el5, I got a nice error that said “Bad superblock on rootfs, could not read /etc/lvm/lvm.conf” followed by a lot of lines about LVM not finding any volume groups on /dev/md0.  Then the nerkel panicked.  Yes, I mean kernel but the typo is too damn funny to delete.  I tried to boot an older nerkel, but this time it just kept dying when LVM tried to read volume group metadata on /dev/md0.  So I popped in the Knoppix 6.2 CD and tried to boot off of that, only to find that my DVD+RW drive in the server is dead.  I haven’t used it in like 4 years so anything’s possible.  So I hooked up another CD-ROM drive and booted Knoppix.

After about an hour of fiddling (including hand-assembling and starting the MDRAID-5 array), I finally got LVM to recognize and repair the VG metadata on the array.  From there it was simply a matter of lvscan, lvck, and fsck to get the filesystems back to consistent.  I rebooted the box (an older kernel – 2.6.18-194.3.1.el5) and after it re-ran fsck (since the e2fsck in RHEL 5.5 is older than the e2fsck in Knoppix 6.2 it needed to remove the extra dir_hash data that the newer fsck added), the box came up fine in single user mode.  I edited /boot/grub/grub.conf to remove the non-working kernel version from the list, did a yum update, and did a shutdown.  I put the box back together and started it and it booted fine into runlevel 3 with no assistance from the operator.


