To be honest ... for "some" reason (which was ok at that time ...) we decided to use SuSE Linux, although I would have preferred Gentoo ... well, stick to your decisions ... (damn !)
Never touch a running system, but in fact it was really necessary to patch up the machine.
Actually I also demoed this upgrade successfully within a VMware (you do not want to fuck up your server, which you've never seen, except via ssh ...), so move on ...
Back up your configs, add and refresh installation source, System update .... reboot ... dead !
Well, take another sip and cool down ... what has gone wrong ... no ping ... probably no boot at all ? Bootloader ?
Hmm ... what have we got ... SW Raid ... boot-partition on a md device ... lilo and no grub ... SuSE and no Gentoo ... sounds like problems ...
Ok, fire up the DHCP rescue interface and chroot into the system. To shorten the story ... server is up and running. Why ?
Well ... for sure, Yast fucked up the booloader. I'm not sure, if this was the only cause, as I also found some filesystem errors with fsck on my md devices. Maybe a combination of both.
IMHO there are distributions with smarter package management capabilities ... I have to "emerge" 'bout that ... ;-)
I found a good link for installing and repairing systems with SW raid via rescue interface at http://www2.werk21.de/howto_hetzner_minimalimage_raid.html (ok, it's German ...).
In fact it's tailored to our server host's default images (http://www.hetzner.de), but it's simple and straight forward, you get into the idea, so actually I like it.
My small summary for chrooting into a wrecked up server via a rescue interface to reinstall the bootlader, in this case SuSE 10.2 x86_64 with four SW raids and lilo as bootloader.
# Login to the rescue system
# copy mdamd.conf to the rescue system, in my case I backed up the file elsewhere, otherwise it's a little bit mor tricky ...)
rescue:~# cp mdadm.conf /etc/mdadm/
# here's my mdadm.conf
DEVICE /dev/sda* /dev/sdb*
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=21317e34:0f7eb150:940ee838:c8199c4d
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=f0218eae:ac84eec5:4bff45c9:59e1a906
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=34c710eb:efd952b1:24fda03a:8fc6aa4a
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=f3177bb9:a10c1ee5:72a2f46e:a37685aa
# start raid devices
rescue:~# /etc/init.d/mdadm start
rescue:~# mdadm --assemble --scan
mdadm: /dev/md3 has been started with 2 drives.
mdadm: /dev/md2 has been started with 2 drives.
mdadm: /dev/md1 has been started with 2 drives.
mdadm: /dev/md0 has been started with 2 drives.
# now it could be a good idea to fsck the filesystems, feel free
# now mount your system below /mnt, in my case
rescue:~# mount /dev/md3 /mnt/
rescue:~# mount /dev/md0 /boot/
rescue:~# swapon /dev/md1
rescue:~# mount --bind /dev /mnt/dev/
rescue:~# mount --bind /proc /mnt/proc/
rescue:~# mount --bind /sys /mnt/sys/
# enter your system
rescue:~# chroot /mnt/ /bin/bash
rescue:~# mount -a
rescue:~# umount /proc
rescue:~# mount -tproc none /proc
# now do, what you need to do ...
# if you lost some kernel configs you may want to
rescue:~# vi /etc/sysconfig/kernel
# I recreated initrd
Root device: /dev/md3 (mounted on / as ext3)
Module list: raid1 raid0 3w-xxxx via82cxxx 3w_xxxx processor thermal fan reiserfs 3w_9xxx ext3 xfs ext2 sata_sil sata_via sata_nv amd74xx raid5 linear (xennet xenblk)
Kernel image: /boot/vmlinuz-220.127.116.11-34-default
Initrd image: /boot/initrd-18.104.22.168-34-default
Shared libs: lib64/ld-2.5.so lib64/libacl.so.1.1.0 lib64/libattr.so.1.1.0 lib64/libblkid.so.1.0 lib64/libc-2.5.so lib64/libcom_err.so.2.1 lib64/libdl-2.5.so lib64/libext2fs.so.2.4 lib64/libhistory.so.5.1 lib64/libncurses.so.5.5 lib64/libpthread-2.5.so lib64/libreadline.so.5.1 lib64/librt-2.5.so lib64/libutil-2.5.so lib64/libuuid.so.1.2 lib64/libvolume_id.so.0.73.0 lib64/libnss_files-2.5.so lib64/libnss_files.so.2 lib64/libgcc_s.so.1
Driver modules: ide-core ide-disk scsi_mod sd_mod raid1 raid0 3w-xxxx via82cxxx processor thermal fan 3w-9xxx libata sata_sil sata_via sata_nv amd74xx xor raid456 linear
Filesystem modules: reiserfs mbcache jbd ext3 xfs ext2
Including: initramfs md(mdconf) mdadm fsck.ext3
Run lilo now to update the boot loader configuration.
# now check, what's wrong with the bootloader ...
rescue:~# vi /etc/lilo.conf
# here's my (repaired) lilo.conf
# Modified by YaST2. Last modification on Tue Jan 30 19:28:26 CET 2007
boot = /dev/md0
raid-extra-boot = /dev/sda,/dev/sdb
root = /dev/md3
delay = 3
vga = normal
default = Linux
image = /boot/vmlinuz
label = Linux
initrd = /boot/initrd
vga = 0x314
# reinstall lilo
Added Linux *
The boot record of /dev/md0 has been updated.
The boot record of /dev/sda has been updated.
Warning: /dev/sdb is not on the first disk
The boot record of /dev/sdb has been updated.
# reboot, take a last sip ...