Cloud DevOps Admin Guide

Showing posts with label aix. Show all posts

Sendmail Configuration in AIX

Daemon : sendmail

To start the daemon :

# startsrc -s sendmail -a "-bd -q30m"
where
bd - To start the sendmail as a SMTP mail relay router
q - Is the interval in which the sendmail daemon processes the saved messages

To start the daemon automatically after the system boot:

a. # vi /etc/rc.tcpip

b. Uncomment the below line
start /usr/lib/sendmail "$src_running" "-bd -q${qpi}"

To display the status of the daemon :

# lssrc -s sendmail
# ps -ef | grep sendmail

To stop the daemon :

# stopsrc -s sendmail
# kill -1 `cat /etc/sendmail.pid`

Configuration File:

/etc/sendmail.cf - Where the hostname, Relay server name,... are stored.

Alias File :

/etc/aliases - Where the group(alias) to member mapping is stored.

To Add the hostname in the sendmail configuration :

a. Vi /etc/sendmail.cf

b. Change "#DwYourHostName" to "Dw{hostname of local server}"

c. # refresh -s sendmail

To Add the mail (relay) server in the sendmail configuration :

a. Vi /etc/sendmail.cf

b. Change "#DSrelayhostname" to "DS{hostname of the Relay Server}"

c. # refresh -s sendmail

To send the mails,

# echo "Test Message" | sendmail -v raja@server1.domain.com

If you add any alias in /etc/aliases file, then do the following

# sendmail -bi

This will make the sendmail daemon to re-read the aliases file.

To display the list of messages in the mail queue :

# mailq (or) # sendmail -bp

Directory containing log files and temp files associated with messages in the mail queue :

/var/spool/mqueue

To delete the first 1000 messages in the root's mail queue :

# mail -u root , then enter "d 1-1000"

Using find command

The command find is used to search a given directory for a file or a given expression mentioned in the command. we can also do necessary actions on the output files using xargs

Some important options:

     -xdev                                          Stay on the same file system (dev in fstab).
    -exec cmd {} \;                           Execute the command and replace {} with the full path
    -iname                                        Like -name but is case insensitive
    -ls                                                Display information about the file (like ls -la)
    -size n                                         n is +-n (k M G T P)
    -cmin n                                      File's status was last changed n minutes ago.

find . -type f ! -perm -444	Find files not readable by all
find . -type d ! -perm -111	Find dirs not accessible by all
find /home/user/ -cmin 10 -print	Files created or modified in the last 10 min.
find . -name '*.[ch]' \| xargs grep -E 'expr'	Search 'expr' in this dir and below.
find / -name "*.core" \| xargs rm	Find core dumps and delete them
find / -name "*.core" -print -exec rm {} \;	Other syntax
find . $ -name ".png" -o -name ".jpg" $ -print	iname is not case sensitive
find . -type f -name "*.txt" ! -name README.txt -print	Exclude README.txt files
find /var/ -size +1M -exec ls -lh {} \;
find /var/ -size +1M -ls	Find in /var files above 1M and longlist them
find . -size +10M -size -50M -print
find /usr/ports/ -name work -type d -print -exec rm -rf {} \;	Clean the ports

Find files with SUID; those file have to be kept secure.

Some more Examples:

1 .To list all files in the file system with a given base file name, type:
find / -name .profile -print

This searches the entire file system and writes the complete path names of all files named .profile.
The / (slash) tells the find command to search the root directory and all of its subdirectories.
In order not to waste time, it is best to limit the search by specifying the directories where you think the
files might be.

2. To list files having a specific permission code in the current directory tree, type:
find . -perm 0600 -print

This lists the names of the files that have only owner-read and owner-write permission. The . (dot) tells the find command to search the current directory and its subdirectories. See the chmod command for an explanation of permission codes.

3. To search several directories for files with certain permission codes, type:
find manual clients proposals -perm -0600 -print

This lists the names of the files that have owner-read and owner-write permission and possibly other permissions. The manual, clients, and proposals directories and their subdirectories are searched. In the previous example, -perm 0600 selects only files with permission codes that match 0600 exactly.
In this example, -perm -0600 selects files with permission codes that allow the accesses indicated by 0600 and other accesses above the 0600 level. This also matches the permission codes 0622 and 2744.

4 .To list all files in the current directory that have been changed during the current 24-hour period, type:
find . -ctime 1 -print

5 .To search for regular files with multiple links, type:
find . -type f -links +1 -print

This lists the names of the ordinary files (-type f) that have more than one link (-links +1). Note: Every directory has at least two links: the entry in its parent directory and its own . (dot) entry. The ln command explains multiple file links.

6 . To find all accessible files whose path name contains find, type:
find . -name '*find*' -print

7. To remove all files named a.out or *.o that have not been accessed for a week and that are not mounted using nfs, type:
find / $ -name a.out -o -name '*.o' $ -atime +7 ! -fstype nfs -exec rm {} \;

Note: The number used within the -atime expression is +7. This is the correct entry if you want the command to act on files not accessed for more than a week (seven 24-hour periods).

8 . To print the path names of all files in or below the current directory, except the directories named SCCS or files in the SCCS directories, type:
find . -name SCCS -prune -o -print

To print the path names of all files in or below the current directory, including the names of SCCS directories, type:
find . -print -name SCCS -prune

9. To search for all files that are exactly 414 bytes long, type:
find . -size 414c -print

10. To find and remove every file in your home directory with the .c suffix, type:
find /u/arnold -name "*.c" -exec rm {} \;

Every time the find command identifies a file with the .c suffix, the rm command deletes that file. The rm command is the only parameter specified for the -exec expression. The {} (braces) represent the current path name.

11 .In this example, dirlink is a symbolic link to the directory dir. You can list the files in dir by refering to the symbolic link dirlink on the command line. To do this, type:
find -H dirlink -print

12 . In this example, dirlink is a symbolic link to the directory dir. To list the files in dirlink, traversing the file hierarchy under dir including any
symbolic links, type:
find -L dirlink -print

13 . To determine whether the file dir1 referred by the symbolic link dirlink is newer than dir2, type:
find -H dirlink -newer dir2
Note: Because the -H flag is used, time data is collected not from dirlink but instead from dir1, which is found by traversing the symbolic link.

14. To produce a listing of files in the current directory in ls format with expanded user and group name, type : find . -ls -long

15 .To list the files with ACL/EA set in current directory, type:
find . -ea

System dump devices - AIX

Traditionally the default dump device for system dumps was: /dev/hd6 (paging space) and still is on a lot of systems. If there is not enough space to copy over the dump file after a crash, then the system administrator is prompted upon restart to copy the dump file over to some removable media , like a tape or DVD. This can be time consuming and it is sometimes the case that you want to get your system back up quickly. I can sympathise with system administrators who just ignore the prompt to get the system back up due to business pressure, thus deleting the dump, so then one does not know why it crashed in the first place. If you do not have enough space on your dump device to copy the dump, then during the start-up process, the copydumpmenu menu utility is invoked to give the system administrator the opportunity to copy the dump to a removable media, for example to a tape device if present. The copydumpmenu utility can also be called from the command line when the system is up. The copy directory by default is /var/adm/ras with the file-name:vmcore.<X>.BZ , where X is a sequence number. The dump file is a BZ (BZIP) and not a Z compressed file format.

The snap command can be used to gather information about the dump file, be-sure to include the -D flag, it gathers the information from the primary dump device.

With systems now having more memory available, this has provided more flexibility as to where the primary dump device could be placed. Typically, for systems with over 4 GB of memory there is now a dedicated dump device, called: lg_dumplv

# lsvg -l rootvg |grep sysdump

lg_dumplv sysdump 8 8 open/syncd N/A

Using the sysdumpdev command, one can determine what devices are used for the system dumps.

The following output shows a system using AIX 7.1 having the lg_dumplv as its primary dump device:
# sysdumpdev -l

primary /dev/lg_dumplv

secondary /dev/sysdumpnull

copy directory /var/adm/ras

forced copy flag TRUE

always allow dump TRUE

dump compression ON

type of dump traditional

Looking more closely at the above output fields. Notice that an extra field is now present for AIX 6.1 onwards: type of dump. Currently set to traditional, here you can have it set at (firmware) fw-assisted, if your hardware supports it. For the secondary field, there is no dump device. This is denoted by using the sysdumpnull device. This means all system dumps are lost if it goes to that device. The copy directory is /var/adm/ras, this is where the system dump will be copied to , for either further examination, or to be copied off to go to IBM support. Note that 'always allow dump' is set to true, this must be the case if a dump is to be successfully initiated. Dump compression is on by default.

Common settings using sysdumpdev are:
To change the primary device use: sysdumpdev -P -p <device_name>
To change the secondary device use: sysdumpdev -P -s <device_name>
To change the copy directory use: sysdumpdev -D <path_name>
To change the always dump condition use: sysdumpdev -k for false, sysdumpdev -K for true
To change the type of dump use: sysdumpdev -t <fw-assisted | traditional>

Few Commands:

1. To view the current dump configuration :

# sysdumpdev -l

primary /dev/hd6
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression OFF

2. To change the primary dump device temporarily :

# sysdumpdev -p /dev/dumplv

3. To change the primary dump device permanently :

# sysdumpdev -P -p /dev/dumplv

4. To change the secondary dump device temporarily :

# sysdumpdev -s /dev/dumplv

5. To change the secondary dump device permanently :

# sysdumpdev -P -s /dev/dumplv

6. To set the copy flag :

# sysdumpdev -K

7. To unset the copy flag :

# sysdumpdev -k

8. To estimate the dump size :

# sysdumpdev -e

9. To list the last dump information :

# sysdumpdev -L
Device name: /dev/lg_dumplv
Major device number: 12
Minor device number: 4
Size: 42123543 bytes
Date/Time: Wed Jan 01 12:03:00 CDT 2009
Dump status: 0
dump completed successfully
Dump copy filename: /var/adm/ras/vmcore.1

10. To copy the saved vmcoren file to tape :

# snap -gfkD -o /dev/rmt0

11. To read the dump file :

# crash dump unix
>

12. To change the dump file location and if the copy fails it should ask external media to copy the dump file:

# sysdumpdev -D /opt/dumpfiles

13. To change the dump file location and if the copy fails it should ignore the system dump:

# sysdumpdev -d /opt/dumpfiles

14. To specify the dumps should not be compressed :

# sysdumpdev -c

15. To specify the dumps should be always compress :

# sysdmpdev -C

16. To find out whether a new systemp dump has occured before the last reboot :

# sysdumpdev -z

The compressed dump is now on the LV lg_dumplv. The dump was not copied across to the copy directory when issuing a user initiated dump. To copy the most recent system dump from a system dump device to a directory, use the savecore command. For example, to copy the dump to the directory /var/adm/ras. I could use:

# savecore -d /var/adm/ras
vmcore.0.BZ

If you need to uncompress the file use the dmpuncompress utility. The format of the command is:

dmpuncompress  < filename>

After uncompressing, the dump file is now ready for further investigation using kdb or for transfer to IBM support.

# dmpuncompress vmcore.0.BZ
replaced with vmcore.0

Alternatively you can use the smit dump menu option and select,Copy a system dump. The following screen displays:

                              Copy dump image to:

Type or select values in entry fields.
Press Enter after making all desired changes.

                                                        [Entry Fields]
* Copy dump image from:                              [/dev/lg_dumplv]         /
* Copy dump image to:                                [/var/adm/ras/dump_fil>
* Input and output file blocksize for copy           [4096]                   #
  Size in bytes of dump image                         63894528
  Date of last dump                                   Thu Oct 27 18-02-28 B>

The fields are populated with the current dump that is on the primary dump device. This is the default setting, after the copy, the dump file is present in: /var/adm/ras:

# ls -l dump_file_copy.BZ
-rw-r--r--    1 root     system     63894528 Oct 27 18:15 dump_file_copy.BZ

After a dump has occurred there may well be a minidump generated as a well. Contained in the errorlog output listing earlier in the article, there was an entry for:

F48137AC   1027180411 U O minidump       COMPRESSED MINIMAL DUMP

The minidump is a small compress dump that will be present in: /var/adm/ras. This file contains a snapshot of the system when the system was dumped or crashed. This file can be used for diagnosing if the main dump is not present, due to the dump being removed or not captured.

AIX ML/TL Upgradation steps

1. Pre-installation checks

To check packages/file set consistency
# lppchk –v

If we found some errors. We can get more information about problem & resolve it before continue with installation.
# lppchk -v -m3

Check the current installed ML/TL
# instfix -i|grep ML
# oslevel –s

Check Rootvg

Commit all package/fileset installed on the servers
# smit maintain_software

Check if rootvg is mirrored and all lv's are mirrored correctly (excluding dump and boot volumes). If your rootvg is not mirrored we can skip later in document part for alt_disk_install,
# lsvg -p rootvg
# lsvg rootvg
# lsvg -l rootvg

2. Preinstallation Task

Check for HACMP cluster

Check if cluster software is installed .Check for HACMP running on server.

# lslpp -l | grep -i cluster
Check if the cluster processes are active
# lssrc -g cluster

If HACMP is used, a current fix pack for HACMP should be installed when a new AIX Technology Level is installed. Currently available HACMP fix packs can be downloaded via http://www14.software.ibm.com/webapp/set2/sas/f/hacmp/home.html

3. Check for IBM C/C++ compiler

Updates needs to be installed with TL up gradation. Same can be downloaded from below mentioned links.
http://www-1.ibm.com/support/docview.wss?rs=2239&uid=swg21110831

4. Check for Java Version

If Java is used, current software updates for the Java version(s) should be installed when a new AIX Technology Level is installed. If Java is being used in conjunction with other software, consult the vendor of that software for recommended Java levels

The Java version(s) installed on AIX can be identified with the commands
# lslpp -l | grep -i java

Default Java version can be identified with the
# java -fullversion command.
Java fixes can be downloaded from below link.
http://www14.software.ibm.com/webapp/set2/sas/f/hacmp/home.html

5. Check for recommended TL/SP for system

Gets information of latest TL/SP for system using Fix Level Recommendation Tool available in below link
http://www14.software.ibm.com/webapp/set2/flrt/home

Download latest updates from IBM fix central website & dump in NIM server.

Create resources in NIM servers.

Run mksysb backup of servers on safer side.

Check for running application compatibility if any. Confirm it with application owner.
Free hdisk1 for alternate disk installation

Remove the secondary dump device if present from hdisk1. Then change the settings for secondary dump device to /dev/sysdumpnull.
# sysdumpdev –P –s /dev/sysdumpnull

Unmirror rootvg
#unmirrorvg rootvg

migrate logical volume from hdisk1 to hdisk0 which are not mirrored.
# migratepv hdisk1 hdisk0.

Clear boot record from hdisk0
# chpv -c hdisk1

Add new boot image to the first PV to have “fresh” boot record just for safer side
# bosboot –ad /dev/hdisk0

Set bootlist to hdisk0
# bootlist –m normal hdisk0 hdisk1 (hdisk1 after installation will contain upgraded OS)

Removes the second PV from rootvg
# reducevg rootvg hdisk1

7. Alternate disk migration

Carry out alternate disk installation via nim on hdisk1. We will carry out preview install. If it gets succeed we will go ahead & install TL/SP in applied mode
# smit nimadm

Reboot system. It will be booted from hdisk1 which contains upgraded OS.
# shutdown -Fr

8. Recreate the mirror of rootvg

After few days of stable work and some tests from application users.

Remove alternate disk installed disk
# alt_disk_install –X

Add disk hdisk0 in rootvg
# extendvg rootvg hdisk0

Check for estimated dump
# sysdumpdev –e

Re-create secondary dump device
# sysdumpdev –P –s “dump_device”

Mirror rootvg with hdisk1 in background.
# nohup mirrorvg '-S' rootvg hdisk1 &

Create bootimage on hdisk1
# bosboot -ad /dev/hdisk1

Add hdisk1 to bootlist
# bootlist -m normal hdisk0 hdisk1

Synchronize rootvg
# nohup syncvg -v rootvg &

Flavors of UNIX

The table below summarizes some of the common UNIX variants and clones. While the table lists about forty different variants, the UNIX world isn't nearly as diverse as it used to be. Some of them are defunct and are listed for historical purposes. Others are on their way out. In some cases, vendors have defected to Microsoft technology. In others, mergers and acquisitions have led to the consolidation of different UNIX implementations. A list of "dead" UNIX implementations would be substantial indeed, consisting of hundreds of variations on the letters "U," "I," and "X" (CLIX, CX/UX, MV/UX, SINIX, VENIX, etc.).

**UNIX Variants and Clones**
UNIX Variant	Company/Org.	For More Info
A/UX	Apple Computer, Inc.	defunct
AIX	IBM	http://www.rs6000.ibm.com/ software/
AT&T System V	AT&T	defunct
BS2000/OSD-BC	Siemens AG	http://www.siemens.com/ servers/bs2osd/
BSD/OS	Berkeley Software Design, Inc.	http://www.bsdi.com
CLIX	Intergraph Corp.	http://www.intergraph.com
Debian GNU/Hurd	Software in the Public Interest, Inc.	http://www.gnu.org/ software/hurd/debian- gnu-hurd.html
Debian GNU/Linux	Software in the Public Interest, Inc.	http://www.debian.org
DG/UX	Data General Corp.	http://www.dg.com/ products/html/dg_ux.html
Digital Unix	Compaq Computer Corporation	http://www.unix.digital.com/
DYNIX/ptx	Sequent Computer Systems, Inc.	http://www.sequent.com/ products/software/ operatingsys/dynix.html
Esix UNIX	Esix Systems	http://www.esix.com/
FreeBSD	FreeBSD group	http://www.freebsd.org
GNU Herd	GNU organization	http://www.gnu.org
HAL SPARC64/OS	HAL Computer Systems, Inc.	http://www.hal.com
HP-UX	Hewlett-Packard Company	http://www.hp.com/ unixwork/hpux/
Irix	Silicon Graphics, Inc.	http://www.sgi.com/ software/irix6.5/
Linux	several	http://www.linux.org
LynxOS	Lynx Real-Time Systems, Inc.	http://www.lynx.com/ products/lynxos.html
MachTen	Tenon Intersystems	http://www.tenon.com/ products/machten/
MacOS X Server	Apple Computer, Inc.	http://www.apple.com/macosx/
Minix	none	http://www.cs.vu.nl/~ast/ minix.html
MkLinux	Apple Computer, Inc.	http://www.mklinux.apple.com
NCR UNIX SVR4 MP-RAS	NCR Corporation	http://www3.ncr.com/ product/integrated/ software/p2.unix.html
NetBSD	NetBSD group	http://www.netbsd.org
NeXTSTEP	NeXT Computer Inc.	defunct, see http://www.apple.com/ enterprise/
NonStop-UX	Compaq Computer Corporation	http://www.tandem.com
OpenBSD	OpenBSD group	http://www.openbsd.org
OpenLinux	Caldera Systems, Inc.	http://www.calderasystems.com
Openstep	Apple Computer, Inc.	http://www.apple.com/ enterprise/
QNX Realtime OS	QNX Software Systems Ltd.	http://www.qnx.com/ products/os/qnxrtos.html
Red Hat Linux	Red Hat Software, Inc.	http://www.redhat.com/
Reliant UNIX	Siemens AG	http://www.siemens.com/ servers/rm/
Solaris	Sun Microsystems	http://www.sun.com/ software/solaris/
SunOS	Sun Microsystems	defunct
SuSE	S.u.S.E., Inc.	http://www.suse.com
UNICOS	Silicon Graphics, Inc.	http://www.sgi.com/software/ unicos/
UnixWare	SCO -- The Santa Cruz Operation Inc.	http://www.sco.com/unix/
UTS	Amdahl Corporation	http://www.amdahl.com/uts/

RAM disk in AIX

AIX provides 'mkramdisk' command for producing a disk that resides in the RAM for very high I/O intensive applications like database.
Here is a simple set of commands to create a ramdisk and a filesystem on top of it:

1.create a RAM disk specifying the size

# mkramdisk 5G

The system will assign the available RAM disk. Since this is the first one, it will be called as ramdisk0

2.Check for the new disk

# ls -l /dev | grep -i ram

If there isn't sufficient available memory, the mkramdisk command will warn about the same during the creation.

3.Create and mount a filesystem on top of the ram disk

/sbin/helpers/jfs2/mkfs -V jfs2 /dev/ramdiskx

mount -V jfs2 -o log=NULL /dev/ramdiskx /ramdiskx

The new filesystem will now be available like any other FS.

To remove a ram disk, unmount/remove the filesystem and use 'rmramdisk' command to remove the ram disk.

How to clear AIX NFS cache on a server

Do the following on a server that is having problem exporting NFS mounts
------------------------------------------------------------------------------------

1) Move the currents exports file to another name
       mv /etc/exports /etc/exports.old

2) Create a new exports file
       touch /etc/exports

3) Unexport everything
       exportfs -ua

4) Stop NFS
       stopsrc -g nfs

5) Stop portmapper
       stopsrc -s portmap

6) Change directory to /etc and remove or rename the following files if they exist.
       rm -rf xtab state sm sm.bak rmtab

7) change directory to /var/statman and remote the status monitoring files.
       rm -rf state sm sm.bak

8) start the portmapper
       startsrc -s portmap

9)   start nfs
       startsrc -g nfs

10) re-export what is left in /etc/exports
       exportfs -va

11) refresh the inetd daemon subsystem
       refresh -s inetd

12) Move the /etc/exports file that you backed up back in place.
       mv /etc/exports.old /etc/exports

13) export all directories in /etc/exports
       exportfs -a

Replace failed mirrored internal disk in AIX

The following procedure should be used to replace a failed internal (boot) disk on AIX 5 or higher, with software mirroring.
(Note: in these examples, hdisk0 and hdisk1 are doubly-mirrored internal disks and members of rootvg; hdisk1 has failed)

1. Identify the failed disk by analyzing the errpt logs. Confirm the failure using lspv by checking if "PV State" is "Missing".

2. Break the mirror and remove the device from AIX:

# unmirrorvg rootvg hdisk1
# reducevg rootvg hdisk1
# rmdev -l hdisk1 -d

3. Confirm that the device is no longer present using lspv.

4. Replace the disk drive, letting the new device take the same device name (hdisk1).

5. Add the new device into rootvg:

# extendvg rootvg hdisk1

6. Re-mirror the volume group. No additional arguments are required to doubly-mirror the two internal disks.

# mirrorvg rootvg

7. Re-add the boot image to the new internal disk:

# bosboot -ad hdisk1

8. Re-add the new disk to the bootlist and confirm it is present:

# bootlist -m normal hdisk0 hdisk1
# bootlist -m normal -o
hdisk0 blv=hd5
hdisk1 blv=hd5

AIX Boot Process

Three phases available in BOOT Process

1. Ros kernel init phase
2. Base Device Configuration
3. System boot phase

1. Ros Kernel init phase (PHASE1)

A. Post (power on self test)

In this post it will do basic hardware checking

B. Then it will go to NVRAM and check the boot list for last boot device (hdisk0 or hdisk1).

C. Then it will check the BLV (hd5) in boot device.

D. Then it will check the boot image

E. Then boot image is moved to memory.

F. Then kernel will execute.

2. Base Device configuration (PHASE2)

A. Here cfgmgr will run for device configuration.

3. System Boot Phase (PHASE3)

A. Kernel will execute.
B. The paging space (hd6) will get started.
C. Then following file system will be mounted /, /var. /usr, /home. /tmp
D. Kernel start the init process, it will read the /etc/inittab file and execute the following process.

/etc/rc.boot,
srcmstr
/etc/rc.tcpip
/etc/rc.net

The above network related files /etc/rc.tcpip, /etc/rc.net, used to configure the ip address and routing.

E. Then it will start the system by default run level 2.

NOTE:

Run level 2: It contains all of the terminal process and daemons that are run in the multi user environment. This is default run level.

/etc/inittab file contains four fields, 1. Identifier, 2. Command, 3. Action, 4. Runlevel

Modifying /etc/inittab entries without using vi

These are the steps for modifying /etc/inittab without using Vi editor in AIX.

Before editing take a copy the inittab file to a file named inittab.old

cp –p /etc/inittab /etc/inittab.old

#mkitab ---------->Adds records to the /etc/inittab file.

# mkitab "xcmd:2:respawn:find / -type f > /dev/null 2>&1"

#lsitab ------------>Lists records in the /etc/inittab file

lsitab xcmd

#chitab Changes records in the /etc/inittab file.

# chitab "xcmd:2:once:find / -type f > /dev/null 2>&1"

#rmitab ---> Removes records from the /etc/inittab file.

# rmitab xcmd

# lsitab xcmd

Replacing Faulty Disk in ROOTVG

Analyzing Disk Fault

The first signs that a hard disk is going faulty are temporary error log messages in Error Reporter. If you see random temporary errors, then you don't have an immediate problem but if you start to see a bundle of temporary errors then the disk will need replacing. The worse case scenario is permanent error against a hard disk and stale partitions.

Check to see how many errors have been logged and whether they are permanent of temporary by:

errpt |more

1581762B 0727203502 T H hdisk0 DISK OPERATION ERROR

1581762B 0727203502 P H hdisk0 DISK OPERATION ERROR

The first error log message shows that there is a temporary disk problem on hdisk0, whilst the second error log message shows a permanent error also on hdisk0. The procedures for replacing hdisk0 & hdisk1 <part of rootvg> are slightly different. See the steps below.

To check for stale partitons, run the command: lsvg -l rootvg

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 2 2 closed/syncd N/A

hd6 paging 64 128 2 open/syncd N/A

hd8 jfslog 1 2 2 open/stale N/A

hd4 jfs 4 8 2 open/stale /

Steps for replacing faulty disks in other volume groups are much simpler than replacing disks in rootvg. I have written a procedure for this below also.

For procedures on replacing faulty SSA disk, refer to the link

Replacing hdisk0 in rootvg

Change bootlist

bosboot -a -d hdisk1 Make sure hdisk1 has a boot image

bootlist -m normal hdisk1 hdisk0 Change the bootlist so the system will use hdisk1 before hdisk0

Removing Primary Dump Device

sysdumpdev -l The primary dump device will always be on hdisk0, this will need to be changed

primary /dev/pdumplv

secondary /dev/sdumplv

copy directory /var/adm/dump

forced copy flag FALSE

always allow dump TRUE

dump compression ON

sysdumpdev -Pp /dev/hd6 Changes primary dump device

primary /dev/hd6

secondary /dev/sdumplv

copy directory /var/adm/dump

forced copy flag FALSE

always allow dump TRUE

dump compression ON

rmlv pdumplv Remove the logical volume pdumplv, the primary dump device

Un-Mirroring Hard Disk from VG

Now you need to un-mirror the volume group so the disk can be removed. There are two ways you can do this, one is whereby you run it at a disk level and the other is at a logical partition level. The outcome will be the same with both commands but with the second you have more control.

Method One

unmirrorvg rootvg hdisk0 Unmirrors the disk.

NB: Sometimes this is unstable, especially if you have stale partitions. I have also noticed that if pdumplv is mirrored <shouldn't be by default>, this command will fail. In this instance, unmirror the logical volume and then run the unmirrorvg command, alternatively follow the method below.

Method Two

lsvg -l rootvg Lists all logical volumes in rootvg

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 2 2 closed/syncd N/A

hd6 paging 64 128 2 open/syncd N/A

hd8 jfslog 1 2 2 open/syncd N/A

hd4 jfs 4 8 2 open/syncd /

rmlvcopy LVNAME 1 hdisk0 Run this command for each logical volume

e.g: rmlvcopy hd5 1 hdisk0

Check the disk has been umirrored by: lsvg -l rootvg. For each LV, the PVs column will have 1

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 2 1 closed/syncd N/A

hd6 paging 64 128 1 open/syncd N/A

hd8 jfslog 1 2 1 open/syncd N/A

hd4 jfs 4 8 1 open/syncd /

Make a note of the SCSI id and serial number which will make the CE's life easier when he has to remove the disk. I have highlighted the SCSI id <8> and serial number <4DFJY156> from the example below. The command you need to run is. lscfg -vl hdisk0

DEVICE LOCATION DESCRIPTION

hdisk0 10-88-00-8,0 16 Bit LVD SCSI Disk Drive <9100 MB>

Manufacturer............................IBM

Machine Type and Model......DDYS-T09170M

FRU Number...........................00P1517

ROS Level and ID...................53394841

Serial Number.........................4DFJY156

EC Level...................................F79924

Part Number............................07N3852

Device Specific.<Z0>...............000003029F00013A

Device Specific.<Z1>...............07N4925

Device Specific.<Z2>...............0933

Device Specific.<Z3>...............00315

Device Specific.<Z4>...............0001

Device Specific.<Z5>...............22

Device Specific.<Z6>...............F79924

Remove the Disk from VG

reducevg rootvg hdisk0 Remove hdisk0 from the volume group

rmdev -l hdisk0 -d Remove the definition of hdisk0 from the system

lsvg rootvg Ensure disk is removed

lspv hdisk0 Ensure disk is removed

Now Remove the Disk physically and add the New Disk.

Add the New Disk to the System

cfgmgr Now run configuration Manager to add the new disk to the system

diag Then go into diagnostics to update the system log so the system is aware that hdisk0 has been replaced

Task Selection ->

Log Repair Action ->

hdisk0

Esc 0 To exit diagnostics after Log Repair Action has completed.

errpt | more Check Log Repair Action has taken place. You should see an entry like :-

2F3E09A4 0819110902 I H hdisk2 REPAIR ACTION

diag Go back into diagnostics and certify this disk. This will indicate whether the new disk is ok

Task Selection ->

Certify the disk ->

hdisk0 Commit the changes and exit by pressing F3

Esc 0 To exit diagnostics after Certifying the new disk

Add disk into the Volume Group

extendvg rootvg hdisk0 Add disk into the volume group rootvg

Now you need to re-mirror the disk. Again you can mirror at a disk level or at a logical level.

Re-Mirroring Hard Disk

Method One

mirrorvg rootvg hdisk0 Mirrors the disk

syncvg -v rootvg Synchronizes the volume group and the data contained within it

NB: This method will mirror the logical volume pdumplv. Unmirror the logical volume by:

rmlvcopy pdumplv 1 hdisk1

Method Two

lsvg -l rootvg Lists all the logical volumes to re-mirror

mklvcopy -k LVNAME 2 hdisk0 Run this command for each logical volume. This will also synchronize the data <-k>

e.g: mklvcopy hd5 hdisk0

NB: Do not mirror the logical volume pdumplv

syncvg -v rootvg Synchronizes the volume group and the data contained within it

lsvg -l rootvg Check datavg has been mirrored and status is open/syncd

Check the volume group has been completely re-mirrored by: lsvg -l rootvg. The PV column should have 2 for each LVNAME apart from pdumplv & sdumplv

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 2 2 closed/syncd N/A

hd6 paging 64 128 2 open/syncd N/A

hd8 jfslog 1 2 2 open/syncd N/A

hd4 jfs 4 8 2 open/syncd /

mklv -y 'pdumplv' rootvg 40 hdisk0 Re-create the logical volume for your primary dump device

sysdumpdev -Pp /dev/pdumplv Re-alocate your primary dump device.

primary /dev/pdumplv

secondary /dev/sdumplv

copy directory /var/adm/dump

forced copy flag FALSE

always allow dump TRUE

dump compression ON

bosboot -a -d hdisk0 Update the boot image on hdisk0

bootlist -m normal hdisk0 hdisk1 Change your boot list back.

Cloud DevOps Admin Guide

Sendmail Configuration in AIX

Using find command

System dump devices - AIX

AIX ML/TL Upgradation steps

Flavors of UNIX

RAM disk in AIX

How to clear AIX NFS cache on a server

Replace failed mirrored internal disk in AIX

AIX Boot Process

Modifying /etc/inittab entries without using vi

Replacing Faulty Disk in ROOTVG

Popular Posts

About Me

Total Pageviews