vxrelocd(1M) VxVM 3.5 vxrelocd(1M)
1 Jun 2002
NAME [Toc] [Back]
vxrelocd - monitor VERITAS Volume Manager for failure events and
relocate failed subdisks
SYNOPSIS [Toc] [Back]
/etc/vx/bin/vxrelocd [-o vxrecover_argument] [-O old_version] [-s
save_max] [mail_address...]
DESCRIPTION [Toc] [Back]
The vxrelocd command monitors VERITAS Volume Manager (VxVM) by
analyzing the output of the vxnotify command, and waits for a failure.
When a failure occurs, vxrelocd sends mail via mailx to root (by
default) or to other specified users and relocates failed subdisks.
After completing the relocation, vxrelocd sends more mail indicating
the status of each subdisk replacement. The vxrecover utility is then
run on volumes with relocated subdisks to restore data. Mail is sent
after vxrecover executes.
OPTIONS [Toc] [Back]
-o The -o option and its argument are passed directly to
vxrecover if vxrecover is called. This allows specifying -o
slow[=iodelay] to keep vxrecover from overloading a busy
system during recovery. The default value for the delay is
250 milliseconds.
-O Reverts back to an older version. Specifying -O
VxVM_version directs vxrelocd to use the relocation scheme
in that version.
-s Before vxrelocd attempts a relocation, a snapshot of the
current configuration is saved in /etc/vx/saveconfig.d.
This option specifies the maximum number of configurations
to keep for each diskgroup. The default is 32.
Mail Notification [Toc] [Back]
By default, vxrelocd sends mail to root with information about a
detected failure and the status of any relocation and recovery
attempts. To send mail to other users, add the user login name to the
vxrelocd startup line in the startup script /sbin/rc2.d/S095vxvm-
recover and reboot the system. For example, if the line appears as:
nohup vxrelocd root &
and you want mail also to be sent to user1 and user2, change the line
to read:
nohup vxrelocd root user1 user2 &
- 1 - Formatted: January 24, 2005
vxrelocd(1M) VxVM 3.5 vxrelocd(1M)
1 Jun 2002
Alternatively, you can kill the vxrelocd process and restart it as
vxrelocd root mail_address, where mail_address is a user's login name.
Do not kill the vxrelocd process while a relocation attempt is in
progress.
The mail notification that is sent when a failure is detected follows
this format:
Failures have been detected by the VERITAS Volume Manager:
failed disks:
medianame
...
failed plexes:
plexname
...
failed log plexes:
plexname
...
failing disks:
medianame
...
failed subdisks:
subdiskname
...
The Volume Manager will attempt to find spare disks,
relocate failed subdisks and then recover the data
in the failed plexes.
The medianame list under failed disks specifies disks that appear to
have completely failed; the medianame list under failing disks
indicates a partial disk failure or a disk that is in the process of
failing. When a disk has failed completely, the same medianame list
appears under both failed disks and failing disks. The plexname list
under failed plexes shows plexes that were detached due to I/O
failures that occurred while attempting to do I/O to subdisks they
contain. The plexname list under failed log plexes indicates RAID-5
or DRL (dirty region logging) log plexes that have failed. The
subdiskname list specifies subdisks in RAID-5 volumes that were
detached due to I/O errors.
Spare Space [Toc] [Back]
A disk can be marked as ``spare.'' This makes the disk available as a
site for relocating failed subdisks. Disks that are marked as spares
are not used for normal allocations unless you explicitly specify
them. This ensures that there is a pool of spare space available for
relocating failed subdisks and that this space does not get consumed
by normal operations. Spare space is the first space used to relocate
failed subdisks. However, if no spare space is available, or the
- 2 - Formatted: January 24, 2005
vxrelocd(1M) VxVM 3.5 vxrelocd(1M)
1 Jun 2002
available spare space is not suitable or sufficient, free space is
also used except for those marked with the nohotuse flag. See the
vxedit(1M) and vxdiskadm(1M) manual pages for more information on
marking a disk as a spare or nohotuse.
Nohotuse Space [Toc] [Back]
A disk can be marked as ``nohotuse.'' This excludes the disk from
being used by vxrelocd, but it is still available as free space. See
the vxedit(1M) and vxdiskadm(1M) manual pages for more information on
marking a disk as a spare or nohotuse.
Replacement Procedure [Toc] [Back]
After mail is sent, vxrelocd relocates failed subdisks (those listed
in the subdisks list). This requires finding appropriate spare or
free space in the same disk group as the failed subdisk. A disk is
eligible as replacement space if it is a valid VERITAS Volume Manager
disk (VM disk) and contains enough space to hold the data contained in
the failed subdisk. If no space is available on spare disks, the
relocation uses free space that is not marked nohotuse.
To determine which of the eligible disks to use, vxrelocd first tries
the disk that is closest to the failed disk. The value of
``closeness'' depends on the controller, target, and disk number of
the failed disk. A disk on the same controller as the failed disk is
closer than a disk on a different controller; a disk under the same
target as the failed disk is closer than one under a different target.
vxrelocd moves all subdisks from a failing drive to the same
destination disk if possible.
If no spare or free space is found, mail is sent explaining the
disposition of volumes that had storage on the failed disk:
Hot-relocation was not successful for subdisks on disk
dm_name in volume v_name in disk group dg_name.
No replacement was made and the disk is still unusable.
The following volumes have storage on medianame:
volumename
...
These volumes are still usable, but the redundancy of
those volumes is reduced. Any RAID-5 volumes with storage
on the failed disk may become unusable in the face of
further failures.
If any non-RAID-5 volumes were made unusable due to the disk failure,
the following message is included:
- 3 - Formatted: January 24, 2005
vxrelocd(1M) VxVM 3.5 vxrelocd(1M)
1 Jun 2002
The following volumes:
volumename
...
have data on medianame but have no other usable mirrors on
other disks. These volumes are now unusable and the data on
them is unavailable. These volumes must have their data restored.
If any RAID-5 volumes were made unavailable due to the disk failure,
the following message is included:
The following RAID-5 volumes:
volumename
...
had storage on medianame and have experienced
other failures. These RAID-5 volumes are now unusable
and data on them is unavailable. These RAID-5 volumes must
have their data restored.
If there is spare space available, a snapshot of the current
configuration is saved in
/etc/vx/saveconfig.d/dg_name.yymmdd_hhmmss.mpvsh before attempting a
subdisk relocation. Relocation requires setting up a subdisk on the
spare or free space not marked with nohotuse and using it to replace
the failed subdisk. If this is successful, the vxrecover command runs
in the background to recover the data in volumes that had storage on
the disk.
If the relocation fails, the following message is sent:
Hot-relocation was not successful for subdisks
on disk dm_name in volume v_name in disk
group dg_name. No replacement was made
and the disk is still unusable.
If any volumes (RAID-5 or otherwise) become unusable due to the
failure, the following message is included:
The following volumes:
volumename
...
have data on dm_name but have no other usable mirrors on other
disks. These volumes are now unusable and the data on them is
- 4 - Formatted: January 24, 2005
vxrelocd(1M) VxVM 3.5 vxrelocd(1M)
1 Jun 2002
unavailable. These volumes must have their data restored.
If the relocation procedure was successful and recovery has begun, the
following mail message is sent:
Volume v_name Subdisk sd_name relocated to
newsd_name, but not yet recovered.
After recovery completes, a mail message is sent relaying the result
of the recovery procedure. If the recovery is successful, the
following message is included in the mail:
Recovery complete for volume v_name in disk
group dg_name.
If the recovery was not successful, the following message is included
in the mail:
Failure recovering v_name in disk group dg_name.
Disabling vxrelocd [Toc] [Back]
If you do not want automatic subdisk relocation, you can disable the
hot-relocation feature by killing the relocation daemon, vxrelocd, and
preventing it from restarting. However, do not kill the daemon while
it is doing the relocation. To kill the daemon, run the command:
ps -ef
from the command line and find the two entries for vxrelocd. Execute
the command:
kill -9 PID1 PID2
(substituting PID1 and PID2 with the process IDs for the two vxrelocd
processes). To prevent vxrelocd from being started again, you must
comment out the line that starts up vxrelocd in the startup script
/sbin/rc2.d/S095vxvm-recover.
FILES [Toc] [Back]
/sbin/rc2.d/S095vxvm-recover The startup file for vxrelocd.
/etc/vx/saveconfig.d/dg_name.yymmdd_hhmmss.mpvsh
File where vxrelocd saves a snapshot of
the current configuration before
performing a relocation.
- 5 - Formatted: January 24, 2005
vxrelocd(1M) VxVM 3.5 vxrelocd(1M)
1 Jun 2002
SEE ALSO [Toc] [Back]
kill(1), mailx(1), ps(1), vxdiskadm(1M), vxedit(1M), vxintro(1M),
vxnotify(1M), vxrecover(1M), vxsparecheck(1M), vxunreloc(1M)
- 6 - Formatted: January 24, 2005 [ Back ] |