Выбрать главу

#!/bin/bash

#

# mdadm-event-handler :: announce RAID events verbally

#

# Set up the phrasing for the optional element name

if [ "$3" ]

then

 E=", element $3"

fi

# Separate words (RebuildStarted -> Rebuild Started)

$T=$(echo $1|sed "s/\([A-Z]\)/ \1/g")

# Make the voice announcement and then repeat it

echo "Attention! RAID event: $1 on $2 $E"|festival --tts

sleep 2

echo "Repeat: $1 on $2 $E"|festival --tts

When a drive fails, this script will announce something like "Attention! RAID event: Failed on /dev/md0 , element /dev/sdc1 " using the Festival speech synthesizer. It will also announce the start and completion of array rebuilds and other important milestones (make sure you keep the volume turned up).

6.2.1.6. Setting up a hot spare

When a system with RAID 1 or higher experiences a disk failure, the data on the failed drive will be recalculated from the remaining drives. However, data access will be slower than usual, and if any other drives fail, the array will not be able to recover. Therefore, it's important to replace a failed disk drive as soon as possible.

When a server is heavily used or is in an inaccessible locationsuch as an Internet colocation facilityit makes sense to equip it with a hot spare . The hot spare is installed but unused until another drive fails, at which point the RAID system automatically uses it to replace the failed drive.

To create a hot spare when a RAID array is initially created, use the -x argument to indicate the number of spare devices:

# mdadm --create -l raid1 -n 2 -x 1 /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdf1

mdadm: array /dev/md0 started.

$ cat /proc/mdstat

Personalities : [raid1] [raid5] [raid4]

md0 : active raid1 sdf1[2](S) sdc1[1] sdb1[0]

62464 blocks [2/2] [UU]

unused devices: <none>

Notice that /dev/sdf1 is marked with the symbol (S) indicating that it is the hot spare.

If an active element in the array fails, the hot spare will take over automatically:

$ cat /proc/mdstat

Personalities : [raid1] [raid5] [raid4]

md0 : active raid1 sdf1[2] sdc1[3](F) sdb1[0]

62464 blocks [2/1] [U_]

[=>...................] recovery = 6.4% (4224/62464) finish=1.5min speed=603K/sec

unused devices: <none>

When you remove, replace, and readd the failed drive, it will become the hot spare:

# mdadm --remove /dev/md0 /dev/sdc1

mdadm: hot removed /dev/sdc1

...(Physically replace the failed drive)...

# mdadm --add /dev/md0 /dev/sdc1

mdadm: re-added /dev/sdc1

# cat /proc/mdstat

Personalities : [raid1] [raid5] [raid4]

md0 : active raid1 sdc1[2](S) sdf1[1] sdb1[0]

62464 blocks [2/2] [UU]

unused devices: <none>

Likewise, to add a hot spare to an existing array, simply add an extra drive:

# mdadm --add /dev/md0 /dev/sdh1

mdadm: added /dev/sdh1

Since hot spares are not used until another drive fails, it's a good idea to spin them down (stop the motors) to prolong their life. This command will program all of your drives to stop spinning after 15 minutes of inactivity (on most systems, only the hot spares will ever be idle for that length of time):

# hdparm -S 180 /dev/[sh]d[a-z]

Add this command to the end of the file /etc/rc.d/rc.local to ensure that it is executed every time the system is booted:

#!/bin/sh

#

# This script will be executed *after* all the other init scripts.

# You can put your own initialization stuff in here if you don't

# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

hdparm -S 180 /dev/[sh]d[a-z]

6.2.1.7. Monitoring drive health

Self-Monitoring, Analysis, and Reporting Technology (SMART) is built into most modern disk drives. It provides access to drive diagnostic and error information and failure prediction.

Fedora provides smartd for SMART disk monitoring. The configuration file /etc/ smartd.conf is configured by the Anaconda installer to monitor each drive present in the system and to report only imminent (within 24 hours) drive failure to the root email address:

/dev/hda -H -m root

/dev/hdb -H -m root

/dev/hdc -H -m root

(I've left out the many comment lines that are in this file.)

It is a good idea to change the email address to the same alias used for your RAID error reports:

/dev/hda -H -m raid-alert

/dev/hdb -H -m raid-alert

/dev/hdc -H -m raid-alert

If you add additional drives to the system, be sure to add additional entries to this file.

6.2.2. How Does It Work?

Fedora's RAID levels 4 and 5 use parity information to provide redundancy. Parity is calculated using the exclusive-OR function, as shown in Table 6-4.

Table 6-4. Parity calculation for two drives

Bit from drive A Bit from drive B Parity bit on drive C
0 0 0
0 1 1
1 0 1
1 1 0