DISCLAIMER : Please note that blog owner takes no responsibility of any kind for any type of data loss or damage by trying any of the command/method mentioned in this blog. You may use the commands/method/scripts on your own responsibility.If you find something useful, a comment would be appreciated to let other viewers also know that the solution/method work(ed) for you.
Veritas Cluster Cheat sheet
VCS is built on three components: LLT, GAB, and VCS itself. LLT handles kernel-to-kernel communication over the LAN heartbeat links, GAB handles shared disk communication and messaging between cluster members, and VCS handles the management of services.
Once cluster members can communicate via LLT and GAB, VCS is started.
In the VCS configuration, each Cluster contains systems, Service Groups, and Resources. Service Groups contain a list of systems belonging to that group, a list of systems on which the Group should
be started, and Resources. A Resource is something controlled or monitored by VCS, like network interfaces, logical IP's, mount point, physical/logical disks, processes, files, etc. Each resource
corresponds to a VCS agent which actually handles VCS control over the resource.
VCS configuration can be set either statically through a configuration file, dynamically through the CLI, or both. LLT and GAB configurations are primarily set through configuration files.
Configuration
VCS configuration is fairly simple. The three configurations to worry about are LLT, GAB, and VCS resources.
LLT
LLT configuration requires two files: /etc/llttab and /etc/llthosts.
llttab contains information on node-id, cluster membership, and heartbeat links. It should look like this:
# llttab -- low-latency transport configuration file
GAB
GAB requires only one configuration file, /etc/gabtab. This file lists the number of nodes in the cluster and also, if there are any communication disks in the system, configuration for them. Ex:
/sbin/gabconfig -c -n2
tells GAB to start GAB with 2 hosts in the cluster.
LLT and GAB
VCS uses two components, LLT and GAB to share data over the private networks among systems.
These components provide the performance and reliability required by VCS.
LLT and GAB files
Gabtab Entries
LLT and GAB Commands
GAB Port Memberbership
Cluster daemons
Cluster Log Files
Starting and Stopping the cluster
Cluster Status
Cluster Details
Users
System Operations
Dynamic Configuration
The VCS configuration must be in read/write mode in order to make changes. When in read/write mode the
configuration becomes stale, a .stale file is created in $VCS_CONF/conf/config. When the configuration is put
back into read only mode the .stale file is removed.
Service Groups
Service Group Operations
Resources
Resource Operations
Resource Types
Resource Agents
Resource Agent Operations
Once cluster members can communicate via LLT and GAB, VCS is started.
In the VCS configuration, each Cluster contains systems, Service Groups, and Resources. Service Groups contain a list of systems belonging to that group, a list of systems on which the Group should
be started, and Resources. A Resource is something controlled or monitored by VCS, like network interfaces, logical IP's, mount point, physical/logical disks, processes, files, etc. Each resource
corresponds to a VCS agent which actually handles VCS control over the resource.
VCS configuration can be set either statically through a configuration file, dynamically through the CLI, or both. LLT and GAB configurations are primarily set through configuration files.
Configuration
VCS configuration is fairly simple. The three configurations to worry about are LLT, GAB, and VCS resources.
LLT
LLT configuration requires two files: /etc/llttab and /etc/llthosts.
llttab contains information on node-id, cluster membership, and heartbeat links. It should look like this:
# llttab -- low-latency transport configuration file
GAB
GAB requires only one configuration file, /etc/gabtab. This file lists the number of nodes in the cluster and also, if there are any communication disks in the system, configuration for them. Ex:
/sbin/gabconfig -c -n2
tells GAB to start GAB with 2 hosts in the cluster.
LLT and GAB
VCS uses two components, LLT and GAB to share data over the private networks among systems.
These components provide the performance and reliability required by VCS.
LLT | LLT (Low Latency Transport) provides fast, kernel-to-kernel comms and monitors network connections. The system admin configures the LLT by creating a configuration file (llttab) that describes the systems in the cluster and private network links among them. The LLT runs in layer 2 of the network stack |
GAB | GAB (Group membership and Atomic Broadcast) provides the global message order required to maintain a synchronised state among the systems, and monitors disk comms such as that required by the VCS heartbeat utility. The system admin configures GAB driver by creating a configuration file ( gabtab). |
LLT and GAB files
/etc/llthosts | The file is a database, containing one entry per system, that links the LLT system ID with the hosts name. The file is identical on each server in the cluster. |
/etc/llttab | The file contains information that is derived during installation and is used by the utility lltconfig. |
/etc/gabtab | The file contains the information needed to configure the GAB driver. This file is used by the gabconfig utility. |
/etc/VRTSvcs/conf/config/main.cf | The VCS configuration file. The file contains the information that defines the cluster and its systems. |
Gabtab Entries
/sbin/gabdiskconf - i /dev/dsk/c1t2d0s2 -s 16 -S 1123
/sbin/gabdiskconf - i /dev/dsk/c1t2d0s2 -s 144 -S 1124 /sbin/gabdiskhb -a /dev/dsk/c1t2d0s2 -s 16 -p a -s 1123 /sbin/gabdiskhb -a /dev/dsk/c1t2d0s2 -s 144 -p h -s 1124 /sbin/gabconfig -c -n2 |
gabdiskconf
|
-i Initialises the disk region -s Start Block -S Signature |
gabdiskhb (heartbeat disks)
|
-a Add a gab disk heartbeat resource -s Start Block -p Port -S Signature |
gabconfig
|
-c Configure the driver for use -n Number of systems in the cluster. |
LLT and GAB Commands
Verifying that links are active for LLT | lltstat -n |
verbose output of the lltstat command | lltstat -nvv | more |
open ports for LLT | lltstat -p |
display the values of LLT configuration directives | lltstat -c |
lists information about each configured LLT link | lltstat -l |
List all MAC addresses in the cluster | lltconfig -a list |
stop the LLT running | lltconfig -U |
start the LLT | lltconfig -c |
verify that GAB is operating | gabconfig -a Note: port a indicates that GAB is communicating, port h indicates that VCS is started |
stop GAB running | gabconfig -U |
start the GAB | gabconfig -c -n <number of nodes> |
override the seed values in the gabtab file | gabconfig -c -x |
GAB Port Memberbership
List Membership | gabconfig -a |
Unregister port f | /opt/VRTS/bin/fsclustadm cfsdeinit |
Port Function | a gab driver b I/O fencing (designed to guarantee data integrity) d ODM (Oracle Disk Manager) f CFS (Cluster File System) h VCS (VERITAS Cluster Server: high availability daemon) o VCSMM driver (kernel module needed for Oracle and VCS interface) q QuickLog daemon v CVM (Cluster Volume Manager) w vxconfigd (module for cvm) |
Cluster daemons
High Availability Daemon | had |
Companion Daemon | hashadow |
Resource Agent daemon | <resource>Agent |
Web Console cluster managerment daemon | CmdServer |
Cluster Log Files
Log Directory | /var/VRTSvcs/log |
primary log file (engine log file) | /var/VRTSvcs/log/engine_A.log |
Starting and Stopping the cluster
"-stale" instructs the engine to treat the local config as stale "-force" instructs the engine to treat a stale config as a valid one |
hastart [-stale|-force] |
Bring the cluster into running mode from a stale state using the configuration file from a particular server | hasys -force <server_name> |
stop the cluster on the local server but leave the application/s running, do not failover the application/s | hastop -local |
stop cluster on local server but evacuate (failover) the application/s to another node within the cluster | hastop -local -evacuate |
stop the cluster on all nodes but leave the application/s running | hastop -all -force |
Cluster Status
display cluster summary | hastatus -summary |
continually monitor cluster | hastatus |
verify the cluster is operating | hasys -display |
Cluster Details
information about a cluster | haclus -display |
value for a specific cluster attribute | haclus -value <attribute> |
modify a cluster attribute | haclus -modify <attribute name> <new> |
Enable LinkMonitoring | haclus -enable LinkMonitoring |
Disable LinkMonitoring | haclus -disable LinkMonitoring |
Users
add a user | hauser -add <username> |
modify a user | hauser -update <username> |
delete a user | hauser -delete <username> |
display all users | hauser -display |
System Operations
add a system to the cluster | hasys -add <sys> |
delete a system from the cluster | hasys -delete <sys> |
Modify a system attributes | hasys -modify <sys> <modify options> |
list a system state | hasys -state |
Force a system to start | hasys -force |
Display the systems attributes | hasys -display [-sys] |
List all the systems in the cluster | hasys -list |
Change the load attribute of a system | hasys -load <system> <value> |
Display the value of a systems nodeid (/etc/llthosts) | hasys -nodeid |
Freeze a system (No offlining system, No groups onlining) | hasys -freeze [-persistent][-evacuate] Note: main.cf must be in write mode |
Unfreeze a system ( reenable groups and resource back online) | hasys -unfreeze [-persistent] Note: main.cf must be in write mode |
Dynamic Configuration
The VCS configuration must be in read/write mode in order to make changes. When in read/write mode the
configuration becomes stale, a .stale file is created in $VCS_CONF/conf/config. When the configuration is put
back into read only mode the .stale file is removed.
Change configuration to read/write mode | haconf -makerw |
Change configuration to read-only mode | haconf -dump -makero |
Check what mode cluster is running in | haclus -display |grep -i 'readonly' 0 = write mode 1 = read only mode |
Check the configuration file | hacf -verify /etc/VRTSvcs/conf/config Note: you can point to any directory as long as it has main.cf and types.cf |
convert a main.cf file into cluster commands | hacf -cftocmd /etc/VRTSvcs/conf/config -dest /tmp |
convert a command file into a main.cf file | hacf -cmdtocf /tmp -dest /etc/VRTSvcs/conf/config |
Service Groups
add a service group | haconf -makerw hagrp -add groupw hagrp -modify groupw SystemList sun1 1 sun2 2 hagrp -autoenable groupw -sys sun1 haconf -dump -makero |
delete a service group | haconf -makerw hagrp -delete groupw haconf -dump -makero |
change a service group | haconf -makerw hagrp -modify groupw SystemList sun1 1 sun2 2 sun3 3 haconf -dump -makero Note: use the "hagrp -display <group>" to list attributes |
list the service groups | hagrp -list |
list the groups dependencies | hagrp -dep <group> |
list the parameters of a group | hagrp -display <group> |
display a service group's resource | hagrp -resources <group> |
display the current state of the service group | hagrp -state <group> |
clear a faulted non-persistent resource in a specific grp | hagrp -clear <group> [-sys] <host> <sys> |
Change the system list in a cluster | # remove the host hagrp -modify grp_zlnrssd SystemList -delete <hostname> # add the new host (don't forget to state its position) hagrp -modify grp_zlnrssd SystemList -add <hostname> 1 # update the autostart list hagrp -modify grp_zlnrssd AutoStartList <host> <host> |
Service Group Operations
Start a service group and bring its resources online | hagrp -online <group> -sys <sys> |
Stop a service group and takes its resources offline | hagrp -offline <group> -sys <sys> |
Switch a service group from system to another | hagrp -switch <group> to <sys> |
Enable all the resources in a group | hagrp -enableresources <group> |
Disable all the resources in a group | hagrp -disableresources <group> |
Freeze a service group (disable onlining and offlining) | hagrp -freeze <group> [-persistent] note: use the following to check "hagrp -display <group> | grep TFrozen" |
Unfreeze a service group (enable onlining and offlining) | hagrp -unfreeze <group> [-persistent] note: use the following to check "hagrp -display <group> | grep TFrozen" |
Enable a service group. Enabled groups can only be brought online | haconf -makerw hagrp -enable <group> [-sys] haconf -dump -makero Note to check run the following command "hagrp -display | grep Enabled" |
Disable a service group. Stop from bringing online | haconf -makerw hagrp -disable <group> [-sys] haconf -dump -makero Note to check run the following command "hagrp -display | grep Enabled" |
Flush a service group and enable corrective action. | hagrp -flush <group> -sys <system> |
Resources
add a resource | haconf -makerw hares -add appDG DiskGroup groupw hares -modify appDG Enabled 1 hares -modify appDG DiskGroup appdg hares -modify appDG StartVolumes 0 haconf -dump -makero |
delete a resource | haconf -makerw hares -delete <resource> haconf -dump -makero |
change a resource | haconf -makerw hares -modify appDG Enabled 1 haconf -dump -makero Note: list parameters "hares -display <resource>" |
change a resource attribute to be globally wide | hares -global <resource> <attribute> <value> |
change a resource attribute to be locally wide | hares -local <resource> <attribute> <value> |
list the parameters of a resource | hares -display <resource> |
list the resources | hares -list |
list the resource dependencies | hares -dep |
Resource Operations
Online a resource | hares -online <resource> [-sys] |
Offline a resource | hares -offline <resource> [-sys] |
display the state of a resource( offline, online, etc) | hares -state |
display the parameters of a resource | hares -display <resource> |
Offline a resource and propagate the command to its children | hares -offprop <resource> -sys <sys> |
Cause a resource agent to immediately monitor the resource | hares -probe <resource> -sys <sys> |
Clearing a resource (automatically initiates the onlining) | hares -clear <resource> [-sys] |
Resource Types
Add a resource type | hatype -add <type> |
Remove a resource type | hatype -delete <type> |
List all resource types | hatype -list |
Display a resource type | hatype -display <type> |
List a partitcular resource type | hatype -resources <type> |
Change a particular resource types attributes | hatype -value <type> <attr> |
Resource Agents
add a agent | pkgadd -d . <agent package> |
remove a agent | pkgrm <agent package> |
change a agent | n/a |
list all ha agents | haagent -list |
Display agents run-time information i.e has it started, is it running ? | haagent -display <agent_name> |
Display agents faults | haagent -display |grep Faults |
Resource Agent Operations
Start an agent | haagent -start <agent_name>[-sys] |
Stop an agent | haagent -stop <agent_name>[-sys] |
Show the line number while monitoring the log files using tail -f command
You can combine the tail -f command using either cat or awk commands:
Method 1:
Method 2:
You should get the similar output as below:
Method 1:
# tail -f syslog|cat -n
Method 2:
# tail -f syslog|awk '{print NR,$0}'
You should get the similar output as below:
1 Mar 4 15:21:07 oraserver local1:info Oracle Audit[1433636]: 2 Mar 4 15:21:07 oraserver local1:info Oracle Audit[4198698]: 3 Mar 4 15:21:07 oraserver local1:info Oracle Audit[5456076]: 4 Mar 4 15:21:07 oraserver local1:info Oracle Audit[6545472]: 5 Mar 4 15:21:09 oraserver local1:info Oracle Audit[5456078]: 6 Mar 4 15:21:09 oraserver local1:info Oracle Audit[1609878]: 7 Mar 4 15:21:09 oraserver local1:info Oracle Audit[5456078]: 8 Mar 4 15:21:17 oraserver auth|security:info sshd[6545478]: 9 Mar 4 15:21:17 oraserver auth|security:info sshd[5456086]: 10 Mar 4 15:21:46 oraserver daemon:info CCIRMTD[295062]:
This perfpmr package contains a number of performance tools and some instructions. Some of these tools are products available with AIX. Some of the tools are prototype internal tools (setpri, setsched, iomon, getevars, pmcount, lsc, fcstat2, memfill, getdate, perfstat_trigger) and are not generally available to customers.
All results generated by the Program are estimates and averages based on certain assumptions and conditions. Each environment has its own unique set of requirements that no tool can entirely account for. No representation is made that the results will be accurate or achieved in any given IBM installation environment. The result is based on specific configurations and run time environments. Customer results will vary. Any configuration recommended by the Program should be tested and verified. Any code provided is for illustrative purposes only.
This package contains a set of tools and instructions for collecting the data needed to analyze a AIX performance problem. This tool set runs on AIX V7.1
A. Purpose:
1. This section describes the set of steps that should be followed to collect performance data.
2. The goal is to collect a good base of information that can be used by AIX technical support specialists or development lab programmers to get started in analyzing and solving the performance problem. This process may need to be repeated after analysis of the initial set of data is completed and/or AIX personnel may want to dial-in to the customer's machine if appropriate for additional data collection/analysis.
B. Collection of the Performance Data on Your System
1. Detailed System Performance Data:
Detailed performance data is required to analyze and solve a performance problem. Follow these steps to invoke the supplied shell scripts:
NOTE: You must have root user authority when executing these shell scripts.
a. Create a data collection directory and 'cd' into this directory.
Allow at least 45MB*#of_logicalcpus of unused space in whatever file system is used.
*IMPORTANT* - DO NOT COLLECT DATA IN A REMOTELY MOUNTED FILESYSTEM SINCE IPTRACE MAY HANG
For example using /tmp filesystem:
# mkdir /tmp/perfdata
# cd /tmp/perfdata
b. HACMP users:
Generaly recommend HACMP deadman switch interval be lengthened while performance data is being collected.
c. Collect our 'standard' PERF71 data for 600 seconds (600 seconds = 10 minutes). Start the data collection while the problem is already occurring with the command:
/directory_where_perfpmrscripts_are_installed/perfpmr.sh 600
The perfpmr.sh shell provided will:
- immediately collect a 5 second trace (trace.sh 5)
- collect 600 seconds of general system performance data (monitor.sh 600).
- collect hardware and software configuration information (config.sh).
In addition, if it finds the following programs available in the current execution path, it will:
- collect 10 seconds of iptrace information (iptrace.sh 10)
- collect 10 seconds of filemon information (filemon.sh 10)
- collect 60 seconds of tprof information (tprof.sh 60)
NOTE: Since a performance problems may mask other problems, it is not uncommon to fix one issue and then collect more data to work on another issue.
d. Answer the questions in the text file called 'PROBLEM.INFO' in the data collection directory created above. This background information about your problem helps us better understand what is going wrong.
A. Combine all the collected data into a single binary 'tar' file and compress it:
Put the completed PROBLEM.INFO in the same directory where the data was collected (ie. /tmp/perfdata in the following example). Change to the parent directory, and use the tar command as follows:
Either use: cd /tmp; perfpmr.sh -o perfdata -z pmr#.pax.gz
or
# cd /tmp/perfdata (or whatever directory used
to collect the data)
# cd ..
# pax -xpax -vw perfdata | gzip -c > pmr#.pax.gz
B. Submission of testcase to IBM:
Internet 'ftp' access:
----------------------
The quickest method to get the data analyzed is for the customer to ftp the data directly to IBM. Data placed on the server listed below cannot be accessed by unauthorized personnel. Please contact your IBM representative for the PMR#, BRANCH#, and COUNTRY#. IBM uses all 3 to uniquely associate your data with your problem tracking record.
'ftp testcase.software.ibm.com'
Userid: anonymous
password: your_internet_email_address
(ie. smith@austin.ibm.com)
'cd toibm/aix'
'bin'
'put PMR#.BRANCH#.COUNTRY#.pax.gz'
(ie. '16443.060.000.pax.gz'
'quit'
If the transfer fails with an error, it's possible that a file already exists by the same name on the ftp server. In this case, add something to the name of the file to differentiate it from the file already on the ftp site (ex. 16443.060.000.july18.pax.gz).
Notify your IBM customer representative you have submitted the data. They will then update the defect report to indicate the data is available for analysis.
All results generated by the Program are estimates and averages based on certain assumptions and conditions. Each environment has its own unique set of requirements that no tool can entirely account for. No representation is made that the results will be accurate or achieved in any given IBM installation environment. The result is based on specific configurations and run time environments. Customer results will vary. Any configuration recommended by the Program should be tested and verified. Any code provided is for illustrative purposes only.
AIX 7.1 PERFORMANCE DATA COLLECTION PROCESS
Note: The act of collecting performance data will add load on the system. HACMP users may want to extend the Dead Man Switch timeout or shutdown HACMP prior to collecting perfpmr data to avoid accidental failovers.
I. INTRODUCTION
This package contains a set of tools and instructions for collecting the data needed to analyze a AIX performance problem. This tool set runs on AIX V7.1
II. HOW TO OBTAIN AND INSTALL THE TOOLS ON AN IBM RISC SYSTEM/6000.
A. OBTAINING THE PACKAGE
The package will be distributed as a compressed "tar" file available electronically.
From the internet:
==================
'ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr'
B. INSTALLING THE PACKAGE
The following assumes the tar file is in /tmp and named 'perf71.tar.Z'.
a. login as root or use the 'su' command to obtain root authority
b. create perf71 directory and move to that directory (this example assumes the directory built is under /tmp)
# mkdir /tmp/perf71
# cd /tmp/perf71
c. extract the shell scripts out of the compressed tar file:
# zcat /tmp/perf71.tar.Z | tar -xvf -
The package will be distributed as a compressed "tar" file available electronically.
From the internet:
==================
'ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr'
B. INSTALLING THE PACKAGE
The following assumes the tar file is in /tmp and named 'perf71.tar.Z'.
a. login as root or use the 'su' command to obtain root authority
b. create perf71 directory and move to that directory (this example assumes the directory built is under /tmp)
# mkdir /tmp/perf71
# cd /tmp/perf71
c. extract the shell scripts out of the compressed tar file:
# zcat /tmp/perf71.tar.Z | tar -xvf -
III. HOW TO COLLECT DATA FOR AN AIX PERFORMANCE PROBLEM
A. Purpose:
1. This section describes the set of steps that should be followed to collect performance data.
2. The goal is to collect a good base of information that can be used by AIX technical support specialists or development lab programmers to get started in analyzing and solving the performance problem. This process may need to be repeated after analysis of the initial set of data is completed and/or AIX personnel may want to dial-in to the customer's machine if appropriate for additional data collection/analysis.
B. Collection of the Performance Data on Your System
1. Detailed System Performance Data:
Detailed performance data is required to analyze and solve a performance problem. Follow these steps to invoke the supplied shell scripts:
NOTE: You must have root user authority when executing these shell scripts.
a. Create a data collection directory and 'cd' into this directory.
Allow at least 45MB*#of_logicalcpus of unused space in whatever file system is used.
*IMPORTANT* - DO NOT COLLECT DATA IN A REMOTELY MOUNTED FILESYSTEM SINCE IPTRACE MAY HANG
For example using /tmp filesystem:
# mkdir /tmp/perfdata
# cd /tmp/perfdata
b. HACMP users:
Generaly recommend HACMP deadman switch interval be lengthened while performance data is being collected.
c. Collect our 'standard' PERF71 data for 600 seconds (600 seconds = 10 minutes). Start the data collection while the problem is already occurring with the command:
/directory_where_perfpmrscripts_are_installed/perfpmr.sh 600
The perfpmr.sh shell provided will:
- immediately collect a 5 second trace (trace.sh 5)
- collect 600 seconds of general system performance data (monitor.sh 600).
- collect hardware and software configuration information (config.sh).
In addition, if it finds the following programs available in the current execution path, it will:
- collect 10 seconds of iptrace information (iptrace.sh 10)
- collect 10 seconds of filemon information (filemon.sh 10)
- collect 60 seconds of tprof information (tprof.sh 60)
NOTE: Since a performance problems may mask other problems, it is not uncommon to fix one issue and then collect more data to work on another issue.
d. Answer the questions in the text file called 'PROBLEM.INFO' in the data collection directory created above. This background information about your problem helps us better understand what is going wrong.
IV. HOW TO SEND THE DATA TO IBM.
A. Combine all the collected data into a single binary 'tar' file and compress it:
Put the completed PROBLEM.INFO in the same directory where the data was collected (ie. /tmp/perfdata in the following example). Change to the parent directory, and use the tar command as follows:
Either use: cd /tmp; perfpmr.sh -o perfdata -z pmr#.pax.gz
or
# cd /tmp/perfdata (or whatever directory used
to collect the data)
# cd ..
# pax -xpax -vw perfdata | gzip -c > pmr#.pax.gz
B. Submission of testcase to IBM:
Internet 'ftp' access:
----------------------
The quickest method to get the data analyzed is for the customer to ftp the data directly to IBM. Data placed on the server listed below cannot be accessed by unauthorized personnel. Please contact your IBM representative for the PMR#, BRANCH#, and COUNTRY#. IBM uses all 3 to uniquely associate your data with your problem tracking record.
'ftp testcase.software.ibm.com'
Userid: anonymous
password: your_internet_email_address
(ie. smith@austin.ibm.com)
'cd toibm/aix'
'bin'
'put PMR#.BRANCH#.COUNTRY#.pax.gz'
(ie. '16443.060.000.pax.gz'
'quit'
If the transfer fails with an error, it's possible that a file already exists by the same name on the ftp server. In this case, add something to the name of the file to differentiate it from the file already on the ftp site (ex. 16443.060.000.july18.pax.gz).
Notify your IBM customer representative you have submitted the data. They will then update the defect report to indicate the data is available for analysis.
Cloning a rootvg using alternate disk installation
Using this scenario, you can clone AIX® running on rootvg to an alternate disk on the same system, install a user-defined software bundle, and run a user-defined script to customize the AIX image on the alternate disk.
The information in
this how-to scenario was tested using specific versions of AIX. The
results you obtain might vary significantly depending on your version
and level of AIX.
Because
the alternate disk installation process
involves cloning an existing rootvg to a
target alternate disk, the target alternate disk must not be already
assigned to a volume group.
In this scenario you will do the following:
- Prepare for the alternate disk installation
- Perform the alternate disk installation and customization
- Boot off the alternate disk
- Verify the operation
Step 1. Prepare for the alternate disk installation
- Check the status of physical disks on your system. Type:
Output similar to the following displays:# lspv
We can use hdisk1 as our alternate disk because no volume group is assigned to this physical disk.hdisk0 0009710fa9c79877 rootvg active hdisk1 0009710f0b90db93 None
- Check to see if the alt_disk_copy fileset
has been installed by running the following:
Output similar to the following displays if the alt_disk_copy fileset is not installed:# lslpp -L bos.alt_disk_install.rte
lslpp: 0504-132 Fileset bos.alt_disk_install.rte not installed.
- Using volume 1 of the AIX installation
media, install the alt_disk_copy fileset
by running the following:
Output similar to the following displays:# geninstall -d/dev/cd0 bos.alt_disk_install.rte
+-----------------------------------------------------------------------------+ Summaries: +-----------------------------------------------------------------------------+ Installation Summary -------------------- Name Level Part Event Result ------------------------------------------------------------------------------- bos.alt_disk_install.rte 5.3.0.0 USR APPLY SUCCESS
- Create a user-defined bundle called /usr/sys/inst.data/user_bundles/MyBundle.bnd that
contains the following filesets:
I:bos.content_list I:bos.games
- Create the /home/scripts directory:
mkdir /home/scripts
- Create a user-defined customization script called AddUsers.sh in
the /home/scripts directory:
touch /home/scripts/AddUsers.sh chmod 755 /home/scripts/AddUsers.sh
- Edit /home/scripts/AddUsers.sh to contain
the following lines:
mkuser johndoe touch /home/johndoe/abc.txt touch /home/johndoe/xyz.txt
Step 2. Perform the alternate disk installation and customization
- To clone the rootvg to an alternate
disk, type the following at the command line to open the SMIT menu
:
# smit alt_clone
- Select hdisk1 in the Target Disk to Install field.
- Select the MyBundle bundle in the Bundle to Install field.
- Insert volume one of the installation media.
- Type /dev/cd0 in the Directory or Device with images field.
- Type /home/scripts/AddUsers.sh in the Customization script field.
- Press Enter to start the alternate disk installation.
- Check that the alternate disk was created, by running the following:
# lspv
Output similar to the following displays:
hdisk0 0009710fa9c79877 rootvg hdisk1 0009710f0b90db93 altinst_rootvg
Step 3. Boot from the alternate disk
- By default, the alternate-disk-installation process changes the
boot list to the alternate disk. To check this run the following:
Output similar to the following displays:# bootlist -m normal -o
hdisk1
- Reboot the system. Type:
# shutdown -r
The system boots from the boot image on the alternate disk (hdisk1).
Step 4. Verify the operation
- When the system reboots, it will be running off the alternate
disk. To check this, type the following:
Output similar to the following displays:# lspv
hdisk0 0009710fa9c79877 old_rootvg hdisk1 0009710f0b90db93 rootvg
- Verify that the customization script ran correctly, by typing
the following:
Output similar to the following displays:# find /home/johndoe -print
/home/johndoe /home/johndoe/.profile /home/johndoe/abc.txt /home/johndoe/xyz.txt
- Verify that the contents of your software bundle was installed,
by typing the following:
Output similar to the following displays:# lslpp -Lb MyBundle
Fileset Level State Description ---------------------------------------------------------------------------- bos.content_list 5.3.0.0 C AIX Release Content List bos.games 5.3.0.0 C Games
alt_disk in AIX
alt_disk_copy:
Required filesets:
bos.alt_disk_install.boot_images
bos.alt_disk_install.rte
bos.msg.en_US.alt_disk_install.rte
alt_disk_copy -d <hdisk to clone rootvg> this will clone the rootvg to the specified disk
alt_disk_copy -e /etc/exclude.rootvg -d <hdisk> this will use the exclude list during the cloning
alt_disk_copy -T -d <hdisk> it will convert jfs to jfs2 on the new target disk (from 6.1 TL4 only)
alt_rootvg_op -X <cloned rootvg to destroy> this will destroy the cloned rootvg (alt_rootvg_op -X altinst_rootvg)
alt_rootvg_op -W -d <hdisk> this will wake up a disk (cloned filesystems will be mounted with prefix /alt_)
alt_rootvg_op -S -t <hdisk> this will put cloned rootvg to sleep (before that it will do a bosboot)
(-S: put to sleep earlier "waked up" vg, -t: rebuilds the alt. bootimage before sleep)
alt_rootvg_op -v <new cloned rootvg name> -d <hdisk> this will rename the given cloned rootvg name
(after wake-up and sleep the cloned vg name will be changed, in this case it is useful)
alt_disk_mksysb -m /mnt/aix1mksysb -d hdisk1 -k this will resore given mksysb (aix1mksysb) to hdisk1 (-k: keep device configuration)
/var/adm/ras/alt_disk_inst.log alt_disk log file
----------------------------------
alt_disk_copy: (copy hdisk0 to hdsik1)
lv names can't be longer than 11 characters (because of alt_ prefix)
do not take out that disk which was used during boot (otherwise there will be problems with bosboot)
-unmirrorvg rootvg hdisk1
-reducevg rootvg hdisk1
-bosboot -ad hdisk0
-bootlist -m normal hdisk0
-alt_disk_copy -d hdisk1
-bootlist -m normal hdisk0
after booting from hdisk1:
root@aix11: / # lspv
hdisk0 00cf5d8fe9c88a34 old_rootvg
hdisk1 00cf5d8fadcaa9a9 rootvg active
booting from the old disk:
root@aix11: / # lspv
hdisk0 00cf5d8fe9c88a34 rootvg active
hdisk1 00cf5d8fadcaa9a9 altinst_rootvg
removing the new image (keeping the old one):
-alt_rootvg_op -X altinst_rootvg <--removing the new image from hdisk1
-chpv -c hdisk1 <--clear that pv what contained the removed image
-extendvg -f rootvg hdisk1 <--extend the currently used rootvg with the cleared disk (hdisk1)
-mirrorvg -S rootvg hdisk1 <--mirroring rootvg to hdisk1 (checking: lsvg rootvg | grep STALE)(-S: -background sync)
-bosboot -ad hdisk0; bosboot -ad hdisk1 <--recreate the bootimage
-bootlist -m normal hdisk0 hdisk1 <--setup correct bootlist (checking: bootlist -m normal -o)
------------------------------------
Changing lv names (to avoid 11 characters problem):
1. # mkszfile <--creates image.data file of rootvg
2. # vi image.data <--edit image.data
3. # alt_disk_copy -d hdiskX -i /image.data -B <--give image.data fie for alt_disk_copy
--------------------------------------
ONLINE UPDATE WITH ALT_DISK_INSTALL:
unmirrorvg rootvg hdisk1 <--removing mirror ( check: lsvg -p rootvg)
chpv -c hdisk1 <--clears boot record
reducevg rootvg hdisk1 <--free up hdisk1
bosboot -ad hdisk0 <--creates boot record
bootlist -m normal hdisk0 <--sets boot list (check: bootlist -m normal -o)
installp -s <--check if anything can be commited
copy new bos.rte.install <--will be needed for checking if update will be successful (cd to this directory)
install_all_updates -pYd . <--preview of new bos.rte.install
install_all_updates -Yd . <--installs new bos.rte.install
oslevel -sg 5300-09-01-0847 <--shows which fileset is greater than current service pack, it will show bos.rte.install
instfix -i | grep SP <--it will show where to update (53-09-020849_SP)
oslevel -sl 53-09-020849 <--shows which filesets should be update
cd /mnt/5300-09-SP2 <--go to servicepack dir
install_all_updates -pYd . <--preview check
alt_disk_copy -d hdisk1 -b update_all -l /mnt/5300-09-SP2 <--this will do the update
shutdown -Fr <--new OS will boot up
smitty commit <--if needed
alt_rootvg_op -X old_rootvg <--removes cloned old OS
chpv -c hdisk0 <--clears bootrecord
extendvg -f rootvg hdisk0 <--add hdisk0 to rootvg
mirrorvg -S rootvg hdisk0 <--mirror rootvg (-S: in background)
bosboot -a <--creates boot record
bootlist -m normal hdisk0 hdisk1 <--set bootlist
Required filesets:
bos.alt_disk_install.boot_images
bos.alt_disk_install.rte
bos.msg.en_US.alt_disk_install.rte
alt_disk_copy -d <hdisk to clone rootvg> this will clone the rootvg to the specified disk
alt_disk_copy -e /etc/exclude.rootvg -d <hdisk> this will use the exclude list during the cloning
alt_disk_copy -T -d <hdisk> it will convert jfs to jfs2 on the new target disk (from 6.1 TL4 only)
alt_rootvg_op -X <cloned rootvg to destroy> this will destroy the cloned rootvg (alt_rootvg_op -X altinst_rootvg)
alt_rootvg_op -W -d <hdisk> this will wake up a disk (cloned filesystems will be mounted with prefix /alt_)
alt_rootvg_op -S -t <hdisk> this will put cloned rootvg to sleep (before that it will do a bosboot)
(-S: put to sleep earlier "waked up" vg, -t: rebuilds the alt. bootimage before sleep)
alt_rootvg_op -v <new cloned rootvg name> -d <hdisk> this will rename the given cloned rootvg name
(after wake-up and sleep the cloned vg name will be changed, in this case it is useful)
alt_disk_mksysb -m /mnt/aix1mksysb -d hdisk1 -k this will resore given mksysb (aix1mksysb) to hdisk1 (-k: keep device configuration)
/var/adm/ras/alt_disk_inst.log alt_disk log file
----------------------------------
alt_disk_copy: (copy hdisk0 to hdsik1)
lv names can't be longer than 11 characters (because of alt_ prefix)
do not take out that disk which was used during boot (otherwise there will be problems with bosboot)
-unmirrorvg rootvg hdisk1
-reducevg rootvg hdisk1
-bosboot -ad hdisk0
-bootlist -m normal hdisk0
-alt_disk_copy -d hdisk1
-bootlist -m normal hdisk0
after booting from hdisk1:
root@aix11: / # lspv
hdisk0 00cf5d8fe9c88a34 old_rootvg
hdisk1 00cf5d8fadcaa9a9 rootvg active
booting from the old disk:
root@aix11: / # lspv
hdisk0 00cf5d8fe9c88a34 rootvg active
hdisk1 00cf5d8fadcaa9a9 altinst_rootvg
removing the new image (keeping the old one):
-alt_rootvg_op -X altinst_rootvg <--removing the new image from hdisk1
-chpv -c hdisk1 <--clear that pv what contained the removed image
-extendvg -f rootvg hdisk1 <--extend the currently used rootvg with the cleared disk (hdisk1)
-mirrorvg -S rootvg hdisk1 <--mirroring rootvg to hdisk1 (checking: lsvg rootvg | grep STALE)(-S: -background sync)
-bosboot -ad hdisk0; bosboot -ad hdisk1 <--recreate the bootimage
-bootlist -m normal hdisk0 hdisk1 <--setup correct bootlist (checking: bootlist -m normal -o)
------------------------------------
Changing lv names (to avoid 11 characters problem):
1. # mkszfile <--creates image.data file of rootvg
2. # vi image.data <--edit image.data
3. # alt_disk_copy -d hdiskX -i /image.data -B <--give image.data fie for alt_disk_copy
--------------------------------------
ONLINE UPDATE WITH ALT_DISK_INSTALL:
unmirrorvg rootvg hdisk1 <--removing mirror ( check: lsvg -p rootvg)
chpv -c hdisk1 <--clears boot record
reducevg rootvg hdisk1 <--free up hdisk1
bosboot -ad hdisk0 <--creates boot record
bootlist -m normal hdisk0 <--sets boot list (check: bootlist -m normal -o)
installp -s <--check if anything can be commited
copy new bos.rte.install <--will be needed for checking if update will be successful (cd to this directory)
install_all_updates -pYd . <--preview of new bos.rte.install
install_all_updates -Yd . <--installs new bos.rte.install
oslevel -sg 5300-09-01-0847 <--shows which fileset is greater than current service pack, it will show bos.rte.install
instfix -i | grep SP <--it will show where to update (53-09-020849_SP)
oslevel -sl 53-09-020849 <--shows which filesets should be update
cd /mnt/5300-09-SP2 <--go to servicepack dir
install_all_updates -pYd . <--preview check
alt_disk_copy -d hdisk1 -b update_all -l /mnt/5300-09-SP2 <--this will do the update
shutdown -Fr <--new OS will boot up
smitty commit <--if needed
alt_rootvg_op -X old_rootvg <--removes cloned old OS
chpv -c hdisk0 <--clears bootrecord
extendvg -f rootvg hdisk0 <--add hdisk0 to rootvg
mirrorvg -S rootvg hdisk0 <--mirror rootvg (-S: in background)
bosboot -a <--creates boot record
bootlist -m normal hdisk0 hdisk1 <--set bootlist