Blog

Restore Failed Systems

System faiulre is common for servers in a large-scale cluster. When a disk device holding the OS fails, there is usually no other choice than reinstallation of the system. After replacing the failed disk device and installing the system, some configuration needs to be co

1. Network configuration:
a) configure IP address, gateway and netmask by editing
# vi /etc/sysconfig/network-scripts/ifcfg-eth*
b) restart the network
# service network restart

2. Routing table configuration:
a) for each normal disk device
# route add -net 'net-ip' netmask 'netmask' dev 'eth-dev'
b) enable forwarding (eg. gateway)
# sysctl -w net.ipv4.ip_forward=1

3. iptable configuration:
a) copy the old configuration file
# cp 'iptables-old' /etc/sysconfig/
b) restart the iptables
# service iptables restart

4. System configuration:
a) configure the host name by editing
# vi /etc/sysconfig/network b) enable and disable services by default
# chkconfig network on
# chkconfig sshd on
# chkconfig NetworkManager off

5. File system configuration:
a) update /etc/fstab and restore the mount directories
# mkdir 'mount-dir'
b) mount the old storage
# mount -a

6. Update the sofware:
# yum update

7. Key restoration:
# cp 'key-file' ~/.ssh/

As a recap, it is absolutely necessary to have frequent backup of data and configuration files in your servers to prevent data loss due to system failures.

Ke Hong