failover(7M) failover(7M)
failover - disk device alternate path support
/etc/init.d/failover [init|start]
Failover creates an infrastructure for the definition and management of
multiple paths to a single disk device or lun. This failover
infrastructure is used by an SGI logical volume manager (XLV, XVM) to
select the path used for access to the logical volume(s) created on the
storage device(s). In the presense of i/o errors, the SGI logical volume
manager will request from the failover infrastructure a new path to be
used for access to the erring logical volumes. This path failover
requires the logical volume manager's plexing software.
Failover is only possible for devices which utilize dksc(7m), SGI's scsi
disk driver.
Failover is not a multi-path load balancing driver.
During system startup, failover automatically detects and configures
alternate paths (failover groups) to SGI Clariion RAID, SGI TP9100 RAID,
and SGI TP9400 RAID. To specify a primary path to an SGI RAID, or to
configure primary and alternate paths to other more generic devices,
failover also processes configuration directives contained within the
/etc/failover.conf configuration file which allow manual specification of
a failover group.
Failover uses /sbin/foconfig to parse the configuration file and direct
the creation of failover groups and the specification of primary paths
for SGI RAID. /sbin/foconfig should not be executed directly.
Alternate Path Configuration [Toc] [Back]
Primary and alternate paths to devices are defined by two different
mechanisms. Automatic detection, and manual configuration via a
configuration file.
Detection of paths to SGI RAID devices is automatic and happens at the
time of device discovery during the probing of the scsi and fibre channel
buses. The detected paths to the SGI RAID together make up a failover
group. Any path within a failover group can be used for I/O requests
unless explicit primary path configuration is used (see "Using Manual
Configuration with SGI RAID" below).
Specification of a primary path to an SGI RAID or configuration of other
disk storage devices into failover groups is declared within the
/etc/failover.conf configuration file. This file is processed during
failover startup, and when the /etc/init.d/failover script is executed.
When /etc/init.d/failover is executed with the start parameter, it
automatically calls xlv_assemble(1m). When executed with the init
parameter, the execution of xlv_assemble is skipped.
Page 1
failover(7M) failover(7M)
An entry within /etc/failover.conf which defines a failover group
consists of a single line, or multiple lines, all except the last ending
in a \ (backslash). An entry consists of an arbitrary group name, a
primary path, and optionally up to thirty one alternate paths. The group
name is an arbitrary string of up to 31 characters. Following the group
name are the /dev/scsi names associated with the primary and alternate
paths, the primary being the first path specified.
With manual configuration of failover groups, only the specified primary
path can be used for I/O requests. This is also the case if the
configuration file is used to explicitly specify a primary path to an SGI
RAID.
Using Manual Configuration with SGI RAID
SGI RAID devices can use the /etc/failover.conf configuration file to
explicitly specify primary paths, rather than letting a volume manager
pick one. This is useful, because if multiple controllers can each
access the same storage (in a SAN environment), volume managers will tend
to use a single controller to access all storage connected to a given
storage network, precluding using different host adapters to access
different devices on the storage network.
Specifying a primary path allows the administrator to choose different
host adapters to access different storage devices, because the volume
manager will not be able to access storage through the alternate paths.
This is particularly useful when striping. Only the primary path needs
to be specified in the /etc/failover.conf file with this option.
Alternate paths will be automatically detected.
Using manual configuration is recommended with the SGI TP9100 RAID as
performance to a lun is significantly reduced if both raid controllers
are utilized to access the lun.
Configuration File Directives [Toc] [Back]
Two configuration directives are available for use within the
/etc/failover.conf configuration file. These directives, #verbose and
#disable_target_lun_check modify the behavior of the /sbin/foconfig
program used to parse the configuration file. They must be placed at the
beginning of a line within the configuration file and effect all lines
following the directive. Once enabled, these options cannot be disabled.
#verbose causes the program to emit debugging information.
#disable_target_lun_check permits the definition of a failover group
containing disks or luns with differing target and lun numbers.
Sample Configuration Entries
The sample file shows failover groups, each consisting of a primary path
and one or more alternate paths.
Page 2
failover(7M) failover(7M)
#ident $Revision: 1.10 $
#
# This is the configuration file for table driven failover support.
#
# Please see the failover (7m) manual page for details on
# how to use this file.
#
A sc7d1l0 sc8d1l0
B sc7d1l1 sc8d1l1
C sc7d1l2 sc8d1l2
D sc7d1l3 sc8d1l3
E sc7d1l4 sc8d1l4
F sc7d1l5 sc8d1l5
G sc7d1l6 sc8d1l6
H sc7d1l7 sc8d1l7
I 2000002037003be2/lun0/c3p1 2000002037003be2/lun0/c5p2
J 2000002037003c6c/lun0/c5p2 2000002037003c6c/lun0/c3p1
lun16 2000006016fe0cc0/lun16/c104p0 2000006016fe0cc0/lun16/c108p0 \
2000006016fe0cc0/lun16/c110p0 2000006016fe0cc0/lun16/c109p0 \
2000006016fe0cc0/lun16/c107p0 2000006016fe0cc0/lun16/c106p0 \
2000006016fe0cc0/lun16/c105p0 2000006016fe0cc0/lun16/c103p0
# Cause program to emit debugging information for the following
# groups.
#verbose
# specify a primary path
priA sc14d11l0
priB sc15d11l1
# Cause program to ignore target and lun numbering for these raid luns.
#disable_target_lun_check
raid1 sc16d10l0 sc17d11l0 sc18d12l0 sc19d13l0
Switching to an Alternate Path
Failover to an alternate path is controlled by an SGI logical volume
manager (XLV, XVM) and its plexing software. When the logical volume
manager receives notification of an i/o error, it requests failover to
switch the erring device to an available alternate path. If the path
switch is successful, the SGI logical volume manager retries the failed
i/o using the new path.
The scsifo(1m) command is available to permit the system administrator to
manually request a switch to an alternate path. While the scsifo command
performs a switch, it is not detected by the SGI logical volume manager
until the SGI logical volume manager receives an i/o error on the current
path due to the path no longer being available. The SGI logical volume
manager then begins utilizing the new path.
Page 3
failover(7M) failover(7M)
Inventory Display [Toc] [Back]
The hinv(1m) command will display the path status of primary and
alternate paths configured in the /etc/failover.conf configuration file.
The following sample hinv output reflects the above sample configuration
file. Three of the devices have failed over to the alternate path,
perhaps via the scsifo command.
Integral SCSI controller 7: Version Fibre Channel AIC-1160, revision 1
Disk drive: unit 1 on SCSI controller 7 (primary path)
Disk drive: unit 1,lun 1, on SCSI controller 7 (primary path)
Disk drive: unit 1,lun 2, on SCSI controller 7 (primary path)
Disk drive: unit 1,lun 3, on SCSI controller 7 (primary path)
Disk drive: unit 1,lun 4, on SCSI controller 7 (primary path)
Disk drive: unit 1,lun 5, on SCSI controller 7 (alternate path) DOWN
Disk drive: unit 1,lun 6, on SCSI controller 7 (alternate path) DOWN
Disk drive: unit 1,lun 7, on SCSI controller 7 (alternate path) DOWN
Integral SCSI controller 8: Version Fibre Channel AIC-1160, revision 1
Disk drive: unit 1 on SCSI controller 8 (primary path)
Disk drive: unit 1,lun 1, on SCSI controller 8 (alternate path)
Disk drive: unit 1,lun 2, on SCSI controller 8 (alternate path)
Disk drive: unit 1,lun 3, on SCSI controller 8 (alternate path)
Disk drive: unit 1,lun 4, on SCSI controller 8 (alternate path)
Disk drive: unit 1,lun 5, on SCSI controller 8 (primary path)
Disk drive: unit 1,lun 6, on SCSI controller 8 (primary path)
Disk drive: unit 1,lun 7, on SCSI controller 8 (primary path)
Integral SCSI controller 3: Version Fibre Channel QL2200
Fabric Disk: node 2000002037003be2 port 1 lun 0 on SCSI controller 3 (primary path)
Fabric Disk: node 2000002037003c6c port 1 lun 0 on SCSI controller 3 (alternate path)
Integral SCSI controller 5: Version Fibre Channel QL2200
Fabric Disk: node 2000002037003be2 port 2 lun 0 on SCSI controller 5 (alternate path)
Fabric Disk: node 2000002037003c6c port 2 lun 0 on SCSI controller 5 (primary path)
By using the scsiha(1m) command to reprobe the bus to which a down device
is connected, presuming the device is now responding on the bus, the
"DOWN" indicator displayed by hinv can be cleared.
/etc/failover.conf
/etc/init.d/failover
/etc/init.d/xlv
/var/sysgen/master.d/failover
autoconfig(1m), dks(5m), ds(7m), hinv(1m), ioconfig(1m), scsifo(1m),
scsiha(1m), xlv_assemble(1m), and xlv(7m).
The group name specified within the /etc/failover.conf file has no
external visibility. It cannot be correlated to the group number
information displayed by the scsifo command.
PPPPaaaaggggeeee 4444 [ Back ]
|