Category Archives: Linux

Replace a failed drive in a software RAID

View kenel log to detect a possible failing hard drive

root@ubuntu:~# dmesg
[ 886.492585] sdb: Current: sense key: Recovered Error
[ 886.497903] Additional sense: Recovered data with retries
[ 886.504060] Info fld=0xdf82e1
[ 919.421181] sdb: Current: sense key: Recovered Error
[ 919.426474] Additional sense: Recovered data without ECC - recommend rewrite
[ 919.434375] Info fld=0xd66a9a
[ 1728.424643] sdb: Current: sense key: Recovered Error
[ 1728.429945] Additional sense: Recovered data without ECC - data auto-real
located
[ 1728.438197] Info fld=0xccc0fe
[ 1731.086946] sdb: Current: sense key: Recovered Error
[ 1731.092252] Additional sense: Recovered data without ECC - data auto-real
located
[ 1731.100514] Info fld=0xccb675

Perform SMART test on drive

Install SMART tools

root@ubuntu:~# aptitude install smartmontools

Run SMART tests

root@ubuntu:~# smartctl --test=long /dev/sdb
root@ubuntu:~# smartctl -a /dev/sdb
smartctl version 5.34 [x86_64-unknown-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: FUJITSU MAV2073RCSUN72G Version: 0301
Serial number: 000535S00AUB
Device type: disk
Transport protocol: SAS
Local Time is: Sat Jan 29 14:22:13 2011 CST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Current Drive Temperature: 27 C
Drive Trip Temperature: 65 C
Manufactured in week 35 of year 2005
Current start stop count: 43 times
Recommended maximum start stop count: 10000 times
Elements in grown defect list: 355
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 530114 1342 1342 0 78930.620 0
write: 0 2 0 0 0 38013.435 0
Non-medium error count: 44
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK
ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment --> 9 42754 13399317 [0x3
0x11 0x1]
# 2 Background long Failed in segment --> 9 42635 13399317 [0x3
0x11 0x1]
# 3 Background short Completed - 42635 - [- -
-]
# 4 Background long Failed in segment --> 9 42634 13398730 [0x3
0x11 0x1]
Long (extended) Self Test duration: 2233 seconds [37.2 minutes]
root@ubuntu:~# fdisk -l
Disk /dev/sda: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          12       96358+  fd  Linux raid autodetect
/dev/sda2              13        8924    71585640   fd  Linux raid autodetect
Disk /dev/sdb: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          12       96358+  fd  Linux raid autodetect
/dev/sdb2              13        8924    71585640   fd  Linux raid autodetect
Disk /dev/sdc: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        8924    71681998+  83  Linux
Disk /dev/sdd: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1        8924    71681998+  83  Linux
Disk /dev/md0: 98 MB, 98566144 bytes
2 heads, 4 sectors/track, 24064 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md0 doesn't contain a valid partition table
Disk /dev/md1: 73.3 GB, 73303588864 bytes
2 heads, 4 sectors/track, 17896384 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md1 doesn't contain a valid partition table
Disk /dev/md2: 73.4 GB, 73402286080 bytes
2 heads, 4 sectors/track, 17920480 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md2 doesn't contain a valid partition table
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
      71585536 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
      96256 blocks [2/2] [UU]
unused devices: <none>
root@ubuntu:~# mdadm --query --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed Feb  8 17:29:05 2006
     Raid Level : raid1
     Array Size : 96256 (94.02 MiB 98.57 MB)
    Device Size : 96256 (94.02 MiB 98.57 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent
    Update Time : Mon Jan 31 06:26:13 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
           UUID : 96c88b09:82b06262:679309e4:bbe2fe4f
         Events : 0.20160
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
root@ubuntu:~# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Wed Feb  8 17:29:25 2006
     Raid Level : raid1
     Array Size : 71585536 (68.27 GiB 73.30 GB)
    Device Size : 71585536 (68.27 GiB 73.30 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent
    Update Time : Mon Jan 31 17:42:26 2011
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
           UUID : 6154cd5a:edf5f628:28d7a268:ad434b95
         Events : 0.59383068
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2

Remove the Failed Drive

root@ubuntu:~# mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
      71585536 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[2](F)
      96256 blocks [2/1] [U_]
unused devices: <none>
root@ubuntu:~# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
      71585536 blocks [2/2] [UU]
md0 : active raid1 sda1[0]
      96256 blocks [2/1] [U_]
unused devices: <none>
root@ubuntu:~# mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[2](F)
      71585536 blocks [2/1] [U_]
md0 : active raid1 sda1[0]
      96256 blocks [2/1] [U_]
unused devices: <none>
root@ubuntu:~# mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sda2[0]
      71585536 blocks [2/1] [U_]
md0 : active raid1 sda1[0]
      96256 blocks [2/1] [U_]
unused devices: <none>

Replace Drive

Power down the server and replace the failed physical drive.

Add new Drive to RAID

Verify current partition information

root@ubuntu:~# sfdisk -d /dev/sda
# partition table of /dev/sda
unit: sectors
/dev/sdb1 : start=       63, size=   192779, Id=fd, bootable
/dev/sdb2 : start=   192780, size=143364059, Id=fd
/dev/sdb3 : start=        0, size=        0, Id= 0
/dev/sdb4 : start=        0, size=        0, Id= 0

Copy the partition information over

root@ubuntu:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK
Disk /dev/sdb: 8924 cylinders, 255 heads, 63 sectors/track
sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0
   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    192779     192717  fd  Linux raid autodetect
/dev/sdb2        192780 143364059  143171280  fd  Linux raid autodetect
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table
Re-reading the partition table ...
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

Verify partition information

root@ubuntu:~# fdisk -l /dev/sda /dev/sdb
Disk /dev/sda: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          12       96358+  fd  Linux raid autodetect
/dev/sda2              13        8924    71585640   fd  Linux raid autodetect
Disk /dev/sdb: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          12       96358+  fd  Linux raid autodetect
/dev/sdb2              13        8924    71585640   fd  Linux raid autodetect

Add new drive partitions to software RAID

root@ubuntu:~# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: hot added /dev/sdb1
root@ubuntu:~# mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: hot added /dev/sdb2
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sdb2[2] sda2[0]
      71585536 blocks [2/1] [U_]
      [>....................]  recovery =  0.1% (97408/71585536) finish=73.3min speed=16234K/sec
md0 : active raid1 sdb1[1] sda1[0]
      96256 blocks [2/2] [UU]
unused devices: <none>

Verify that the RAID build process eventually finishes successfully

root@ubuntu:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdc1[0] sdd1[1]
      71681920 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
      71585536 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda1[0]
      96256 blocks [2/2] [UU]
unused devices: <none>

Make Disks Bootable with Grub

If the drive you replaced contains the boot partition, you need to make it bootable by Grub once again.

/dev/sda

root@ubuntu:~# grub
Probing devices to guess BIOS drives. This may take a long time.
       [ Minimal BASH-like line editing is supported.   For
         the   first   word,  TAB  lists  possible  command
         completions.  Anywhere else TAB lists the possible
         completions of a device/filename. ]
grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/menu.lst"... succeeded
Done.
grub> quit

/dev/sdb

root@ubuntu:~# grub
Probing devices to guess BIOS drives. This may take a long time.
       [ Minimal BASH-like line editing is supported.   For
         the   first   word,  TAB  lists  possible  command
         completions.  Anywhere else TAB lists the possible
         completions of a device/filename. ]
grub> device (hd1) /dev/sdb
grub> root (hd1,0)
grub> setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+16 p (hd1,0)/grub/stage2 /grub/menu.lst"... succeeded
Done.
grub> quit
root@ubuntu:~#

References

  • http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array

Force consistant hardware mappings across reboots

Partition mount points

/etc/fstab

# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# /dev/mapper/system-root /
UUID=1e6d957c-5f9f-484e-99cb-4c068ac16ba1 / ext4 noatime,errors=remount-ro 0 1
# /dev/md0 /boot
UUID=39b8423d-e831-40f8-8ab6-c16aff22a984 /boot ext4 noatime 0 2
# /dev/mapper/system-home /home
UUID=b0677542-d6ca-4d80-8dec-e89d02433b4c /home ext4 noatime 0 2
# /dev/mapper/system-tmp /tmp
UUID=95dd18be-815c-40e6-8713-a9b64daf3b0c /tmp ext4 noatime 0 2
# /dev/mapper/system-var /var
UUID=c6c23b39-b611-4b2c-b172-51cbb6d93696 /var ext4 noatime 0 2
# /dev/mapper/system-swap swap
UUID=9be44e9c-d7f6-424e-8d94-7757ce89509c none swap sw 0 0

References

Hard drives

Network interfaces

/etc/udev/rules.d/70-persistent-net.rules

# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x10de:0x0057 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:14:4f:49:f7:18", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x10de:0x0057 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:14:4f:49:f7:19", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
# PCI device 0x8086:0x1010 (e1000)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:14:4f:49:f7:1a", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
# PCI device 0x8086:0x1010 (e1000)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:14:4f:49:f7:1b", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"

RAID Devices

/etc/mdadm/mdadm.conf

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=0b97a661:714c0c61:55ac34b1:8b37b7ca
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=4433950a:a2b749b8:9600c122:bd466c99

Delete a uid from a GPG Key

If you have already committed the UID for an already committed GPG key, you will no longer be able to delete the UID (deluid). You are only permitted to revoke it.

Refresh your key from a keyserver. This will restore the UID you thought
you could delete:
    gpg --keyserver pool.sks-keyservers.net -refresh-keys 0xdecafbad
now use gpg to revoke the UID
    gpg --edit-key 0xdecafbad
gpg displays a list of UIDs on the key. Enter the number of the UID you
wish to revoke. The list is redisplayed with an * next to the selected
one. now use the gpg command revuid to revoke:
    Command> revuid
    Really revoke this user ID? (y/N) y
    Please select the reason for the revocation:
      0 = No reason specified
      4 = User ID is no longer valid
      Q = Cancel
    (Probably you want to select 4 here)
    Your decision? 4
Answer the passphrase prompt and 'save' to update your keyring with the
modified key. Now send the key with revoked UID to the keyservers
    gpg --keyserver pool.sks-keyservers.net -send-keys 0xdecafbad

References

Generate a list of installed packages

Generate the list of installed packages. Exclude those packages that have been removed

dpkg --get-selections | grep -v deinstall > installed-packages.txt
acpi-support                                    install
acpid                                           install
adduser                                         install
adium-theme-ubuntu                              install
aisleriot                                       install
akonadi-server                                  install
alacarte                                        install
alsa-base                                       install
alsa-utils                                      install
anacron                                         install
...

Use this list on another system to set what to install

sudo dpkg --set-selections < installed-packages.txt

Perform the installation. Type ‘I‘ and allow dselect to install of the the packages listed in your list. When it’s finished, type ‘Q‘ and hit the ENTER key to exit dselect.

sudo dselect

If you just want a clean list of installed packages

dpkg --get-selections | grep -v deinstall | cut -f 1 > installed-packages.txt
acpi-support
acpid
adduser
adium-theme-ubuntu
aisleriot
akonadi-server
alacarte
alsa-base
alsa-utils
anacron
...

References