DISCLAIMER : Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility.If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.
Replacing Faulty Disk in ROOTVG
Analyzing Disk Fault
The first signs that a
hard disk is going faulty are temporary error log messages in Error Reporter.
If you see random temporary errors, then you don't have an immediate problem
but if you start to see a bundle of temporary errors then the disk will need
replacing. The worse case scenario is permanent error against a hard disk and
stale partitions.
Check to see how many errors
have been logged and whether they are permanent of temporary by:
errpt |more
1581762B 0727203502 T H hdisk0 DISK OPERATION ERROR
1581762B 0727203502 P H hdisk0 DISK OPERATION ERROR
The first error log
message shows that there is a temporary disk problem on hdisk0, whilst the
second error log message shows a permanent error also on hdisk0. The procedures
for replacing hdisk0 & hdisk1 <part of rootvg> are slightly
different. See the steps below.
To check for stale
partitons, run the command: lsvg -l rootvg
rootvg:
LV
NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
hd5 boot 1 2
2 closed/syncd N/A
hd6 paging 64 128 2 open/syncd N/A
hd8 jfslog 1 2
2 open/stale N/A
hd4 jfs 4 8
2 open/stale /
Steps
for replacing faulty disks in other volume groups are much simpler than
replacing disks in rootvg. I have written a procedure for this below also.
For
procedures on replacing faulty SSA disk, refer to the link
Replacing
hdisk0 in rootvg
Change bootlist
bosboot -a -d hdisk1 Make sure hdisk1
has a boot image
bootlist -m normal hdisk1
hdisk0 Change the bootlist so
the system will use hdisk1 before hdisk0
Removing Primary Dump Device
sysdumpdev -l The primary dump device will always be on hdisk0, this will need to be
changed
primary /dev/pdumplv
secondary /dev/sdumplv
copy directory /var/adm/dump
forced copy flag FALSE
always allow dump TRUE
dump compression ON
sysdumpdev -Pp /dev/hd6 Changes primary dump
device
primary /dev/hd6
secondary /dev/sdumplv
copy directory /var/adm/dump
forced copy flag FALSE
always allow dump TRUE
dump compression ON
rmlv pdumplv Remove
the logical volume pdumplv, the primary dump device
Un-Mirroring Hard Disk from
VG
Now
you need to un-mirror the volume group so the disk can be removed. There are two
ways you can do this, one is whereby you run it at a disk level and the other
is at a logical partition level. The outcome will be the same with both
commands but with the second you have more control.
Method One
unmirrorvg
rootvg hdisk0 Unmirrors
the disk.
NB:
Sometimes this is unstable, especially if you have stale partitions. I have
also noticed that if pdumplv is mirrored <shouldn't be by default>, this
command will fail. In this instance, unmirror the logical volume and then run
the unmirrorvg command, alternatively follow the method below.
Method Two
lsvg
-l rootvg Lists
all logical volumes in rootvg
rootvg:
LV
NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
hd5 boot 1 2
2 closed/syncd N/A
hd6 paging 64 128 2 open/syncd N/A
hd8 jfslog 1 2
2 open/syncd N/A
hd4 jfs 4 8
2 open/syncd /
rmlvcopy LVNAME 1
hdisk0 Run this command for each logical
volume
e.g: rmlvcopy hd5 1 hdisk0
Check the disk has been
umirrored by: lsvg -l rootvg. For each LV,
the PVs column will have 1
rootvg:
LV
NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
hd5 boot 1 2 1 closed/syncd N/A
hd6 paging 64 128 1 open/syncd N/A
hd8 jfslog 1 2
1 open/syncd N/A
hd4 jfs 4 8
1 open/syncd /
Make
a note of the SCSI id and serial number which will make the CE's life easier
when he has to remove the disk. I have highlighted the SCSI id <8> and
serial number <4DFJY156> from the example below. The command you need to
run is. lscfg -vl hdisk0
DEVICE LOCATION DESCRIPTION
hdisk0 10-88-00-8,0 16 Bit LVD SCSI Disk Drive <9100
MB>
Manufacturer............................IBM
Machine Type and
Model......DDYS-T09170M
FRU Number...........................00P1517
ROS Level and
ID...................53394841
Serial
Number.........................4DFJY156
EC
Level...................................F79924
Part
Number............................07N3852
Device
Specific.<Z0>...............000003029F00013A
Device
Specific.<Z1>...............07N4925
Device
Specific.<Z2>...............0933
Device
Specific.<Z3>...............00315
Device
Specific.<Z4>...............0001
Device
Specific.<Z5>...............22
Device
Specific.<Z6>...............F79924
Remove the Disk from VG
reducevg rootvg hdisk0 Remove hdisk0 from the volume group
rmdev -l hdisk0 -d Remove the definition of
hdisk0 from the system
lsvg rootvg Ensure disk is
removed
lspv hdisk0 Ensure disk is
removed
Now Remove the Disk physically and add the New Disk.
Add the New Disk to the
System
cfgmgr Now run configuration Manager to add the
new disk to the system
diag Then go into diagnostics to update the system log so the
system is aware that hdisk0 has been replaced
Task Selection ->
Log Repair Action ->
hdisk0
Esc
0 To exit
diagnostics after Log Repair Action has completed.
errpt | more Check Log Repair
Action has taken place. You should see an entry like :-
2F3E09A4 0819110902 I H hdisk2 REPAIR ACTION
diag Go back into diagnostics and
certify this disk. This will indicate whether the new disk is ok
Task Selection ->
Certify the disk ->
hdisk0 Commit
the changes and exit by pressing F3
Esc
0 To
exit diagnostics after Certifying the new disk
Add disk into the Volume
Group
extendvg rootvg hdisk0 Add disk into the
volume group rootvg
Now you need to re-mirror
the disk. Again you can mirror at a disk level or at a logical level.
Re-Mirroring Hard Disk
Method One
mirrorvg rootvg hdisk0 Mirrors the disk
syncvg -v rootvg Synchronizes
the volume group and the data contained within it
NB: This method will
mirror the logical volume pdumplv. Unmirror the logical volume by:
rmlvcopy pdumplv 1 hdisk1
Method Two
lsvg -l rootvg Lists
all the logical volumes to re-mirror
mklvcopy -k LVNAME 2
hdisk0 Run this command for
each logical volume. This will also synchronize the data <-k>
e.g: mklvcopy hd5 hdisk0
NB: Do not mirror the logical volume pdumplv
syncvg -v rootvg Synchronizes
the volume group and the data contained within it
lsvg -l rootvg Check
datavg has been mirrored and status is open/syncd
Check the volume group has
been completely re-mirrored by: lsvg -l rootvg. The PV column should have 2 for
each LVNAME apart from pdumplv & sdumplv
rootvg:
LV
NAME TYPE LPs
PPs PVs LV STATE MOUNT
POINT
hd5 boot 1 2
2 closed/syncd N/A
hd6 paging 64 128
2 open/syncd N/A
hd8 jfslog 1 2
2 open/syncd N/A
hd4 jfs 4 8
2 open/syncd /
mklv -y 'pdumplv' rootvg 40
hdisk0 Re-create the
logical volume for your primary dump device
sysdumpdev -Pp /dev/pdumplv Re-alocate your
primary dump device.
primary /dev/pdumplv
secondary /dev/sdumplv
copy directory /var/adm/dump
forced copy flag FALSE
always allow dump TRUE
dump compression ON
bosboot -a -d hdisk0 Update
the boot image on hdisk0
bootlist -m normal hdisk0
hdisk1 Change
your boot list back.