The Linux kernel shat all over my server this morning. I went to reboot it to clear a write-lock on the block device for the /usr filesystem and when it booted kernel version 2.6.18-194.11.1.el5, I got a nice error that said “Bad superblock on rootfs, could not read /etc/lvm/lvm.conf” followed by a lot of lines about LVM not finding any volume groups on /dev/md0. Then the nerkel panicked. Yes, I mean kernel but the typo is too damn funny to delete. I tried to boot an older nerkel, but this time it just kept dying when LVM tried to read volume group metadata on /dev/md0. So I popped in the Knoppix 6.2 CD and tried to boot off of that, only to find that my DVD+RW drive in the server is dead. I haven’t used it in like 4 years so anything’s possible. So I hooked up another CD-ROM drive and booted Knoppix.
After about an hour of fiddling (including hand-assembling and starting the MDRAID-5 array), I finally got LVM to recognize and repair the VG metadata on the array. From there it was simply a matter of lvscan, lvck, and fsck to get the filesystems back to consistent. I rebooted the box (an older kernel – 2.6.18-194.3.1.el5) and after it re-ran fsck (since the e2fsck in RHEL 5.5 is older than the e2fsck in Knoppix 6.2 it needed to remove the extra dir_hash data that the newer fsck added), the box came up fine in single user mode. I edited /boot/grub/grub.conf to remove the non-working kernel version from the list, did a yum update, and did a shutdown. I put the box back together and started it and it booted fine into runlevel 3 with no assistance from the operator.