volintro, lsm, LSM - Introduction to Logical Storage Manager
(LSM) terms and commands
The following LSM commands provide a shell-level interface
used by the system administrator and higher-level applications
and scripts to query and manipulate LSM objects:
volassist, volclonedg, vold, voldctl, voldg, voldisk,
voldiskadd, voldiskadm, voldisksetup, voledit, volencap,
volevac, volinfo, volinstall, voliod, vollogcnvt, volmake,
volmend, volmigrate, volmirror, volnotify, volplex, volprint,
volreattach, volreconfig, volrecover, volrestore,
volrootmir, volsave, volsd, volsetup, volstat, voltrace,
volume, volunmigrate, volunroot, volwatch
The following are LSM terms and definitions. A virtual
disk device that looks to applications and file systems
like a physical disk partition device. Volumes present
block and raw device interfaces that are compatible in
their use with disk partition devices. However, a volume
is a virtual device that can be mirrored, striped or
spanned across several disk drives, and moved to use different
storage, using administrative commands. The configuration
of a volume can be changed, using LSM commands,
without causing disruption to applications or file systems
that are using the volume. A copy of a volume's logical
data address space; also known as a mirror. A volume can
have up through 32 plexes associated with it. Each plex
is, at least conceptually, a copy of the volume that is
maintained consistently in the presence of volume I/O and
changes to the LSM configuration. Plexes represent the
primary means of configuring storage for a volume. Plexes
can have a striped, concatenated, or RAID5 organization
(layout). Disks exist as two entities: A physical disk on
which all data is ultimately stored and which exhibits all
the behaviors of the underlying technology. An LSM representation
of the disk which, while mapping one-to-one with
the physical disk, is just a representation of storage
devices from which allocations of storage are made.
The difference is that a physical disk presents the
image of a device with a definable geometry with a
definable number of cylinders, heads, and so on,
while a Logical Storage Manager disk is simply a
unit of allocation with a name and a size.
Disks used by LSM usually contain two special
regions: a private region and a public region. Typically,
each region is formed from a complete partition
of the disk, resulting in a sliced disk;
however, the private and public regions can be
allocated from the same partition, resulting in a
simple disk. A disk used by LSM can also be a
nopriv disk, which has only a public region and no
private region. LSM nopriv disks are created as the
result of encapsulating a disk or disk partition.
A region of storage allocated on a disk for use by
a volume. Subdisks are associated with volumes
through plexes. You organize one or more subdisks
to form plexes based on the plex layout (concatenated,
striped, or RAID5). Subdisks are defined
relative to disk media records. A reference to a
physical disk, or possibly a disk partition. This
record can be thought of as a physical disk identifier
for the disk or partition. Disk media records
are configuration records that provide a name
(known as the disk media name or DM name) that an
administrator can use to reference a particular
disk, independent of its location on the system's
various disk controllers. Disk media records reference
particular physical disks through a disk ID,
which is a unique identifier that is assigned to a
disk when it is initialized for use with the LSM
software.
Operations are provided to set or remove the disk
ID stored in a disk media record. Such operations
have the effect of removing or replacing disks,
along with any associated subdisks. A configuration
record that defines the path to a disk. Disk
access records most often name a unit number. LSM
uses the disk access records stored in a system to
find all disks attached to the system. Disk access
records do not identify particular physical disks.
Disk access records are identified by their disk
access names (also known as DA names).
Through the use of disk IDs, LSM allows you to move
disks between controllers or to different locations
on a controller. When you move a disk, a different
disk access record is used to access the disk,
although the disk media record will continue to
track the actual physical disk.
On some systems, LSM builds a list of disk access
records automatically, based on the list of devices
attached to the system. On these systems, it is not
necessary to define disk access records explicitly.
On other systems, you must define disk access
records with the /sbin/voldisk define command. Specialty
disks, such as RAM disks or floppy disks,
are likely to require explicit /sbin/voldisk define
commands. A group of disks that share a common
configuration database. A configuration database is
a set of records describing objects including
disks, volumes, plexes, and subdisks that are associated
with one particular disk group. Each disk
group has an administrator-assigned name that is
used to reference that disk group. Each disk group
also has an internally defined unique disk group
ID, which differentiates two disk groups with the
same administrator-assigned name.
Disk groups provide a method to partition the configuration
database, so that the database size is
not too large and so that database modifications do
not affect too many drives. They also allow LSM to
operate with groups of physical disk media that can
be moved between systems.
Disks and disk groups have a circular relationship:
disk groups are formed from disks, and disk group
configurations are stored on disks. All disks in a
disk group are stamped with a disk group ID, which
is a unique identifier for naming disk groups. Some
or all disks in a disk group also store copies of
the configuration database of the disk group. A
small database that contains all volume, plex, subdisk,
and disk media records. These databases are
replicated onto some or all disks in the disk
group, with up to two copies on each disk. Because
these databases pertain to disk groups, record
associations cannot span disk groups. Thus, you
cannot define a subdisk on a disk in one disk group
and associate it with a volume in another disk
group. LSM creates and requires one special disk
group called rootdg, which is generally the default
for most utilities. In addition to defining the
regular disk group information, the configuration
database for the root disk group contains local
information that is specific to a disk group. The
rootdg disk group cannot be moved to a different
host, unlike other, administrator-created disk
groups. Most disks used by LSM contain two special
regions: a private region and a public region. Usually,
each region is formed from a complete partition
of the disk; however, the private and public
regions can be allocated from the same partition.
The private region of a disk contains on-disk
structures that are used by LSM for internal purposes.
Each private region is typically 4096 blocks
and begins with a disk header that identifies the
disk and its disk group. Private regions can also
contain copies of a disk group's configuration
database and copies of the disk group's kernel log.
The public region of a disk is the space reserved
for allocating subdisks. Subdisks are defined with
offsets that are relative to the beginning of the
public region of a particular disk partition. A
subdisk represents a contiguous region of the disk,
and subdisks must be contiguous with each other
within the public region. Only one contiguous
region of disk can form the public region for a
disk. A log kept in the private region on the disk
that is written by the LSM kernel. The log contains
records describing the state of volumes in the disk
group. This log provides a mechanism for the kernel
to persistently register state changes so that the
vold daemon can detect the state changes even in
the event of a system failure. A block stored in a
private region of a disk that defines several properties
of the disk, such as the: Size of the private
region Location and size of the public region
Unique disk ID for the disk Disk group ID and disk
group name (if the disk is currently associated
with a disk group) Host ID for a host that has
exclusive use of the disk A 64-byte, universally
unique identifier that is assigned to a physical
disk when its private region is initialized with
the /sbin/voldisk init command. The disk ID is
recorded in the disk media record so that the physical
disk can be related to the disk media record
at system startup. A 64-byte, universally unique
identifier that is assigned to a disk group when
the disk group is created with the /sbin/voldg init
command. This identifier is in addition to the disk
group name, which you assign. The disk group ID
differentiates between disk groups that have the
same administrator-assigned names. A name, usually
assigned by you, that identifies a particular host.
Host IDs are used to assign ownership to particular
physical disks. When a disk is part of a disk group
that is in active use by a particular host, the
disk is stamped with that host's host ID. If
another system attempts to access the disk, it
detects that the disk has a nonmatching host ID and
disallows access until the host with ownership discontinues
use of the disk. Use the /sbin/voldisk
clearimport command to clear the host ID stored on
a disk.
If a disk is a member of a disk group and has a
host ID that matches a particular host, then that
host will import the disk group as part of system
startup. A plex that scatters data evenly across
each of its associated subdisks. A plex has a characteristic
number of stripe columns (represented by
the number of associated subdisks) and a characteristic
stripe width. The stripe width defines how
data with a particular address is allocated to one
of the associated subdisks. Given a stripe width of
128 blocks and two stripe columns, the first group
of 128 blocks is allocated to the first subdisk,
the second group of 128 blocks is allocated to the
second subdisk, the third group to the first subdisk
again, and so on. A plex that uses subdisks
on one or more disks to create a virtual contiguous
region of storage space that is accessed linearly.
If LSM reaches the end of a subdisk while writing
data, it continues to write data to the next subdisk,
which can physically exist on the same disk
or a different disk. This layout allows you to use
space on several regions of the same disk, or
regions of several disks, to create a single big
pool of storage. The volboot file is a special
file (usually stored as /etc/vol/volboot) that is
used to bootstrap the root disk group and to define
a system's host ID. In addition to a host ID, the
volboot file might also contain a list of disk
access records. On system startup, the list of
disks is scanned to find a disk that is a member of
the rootdg disk group and that is stamped with this
system's host ID. The volboot file allows the configuration
to be located on disks not detected by
system initialization, or to be detected in cases
where autoconfig is disabled. When such a disk is
found, its configuration database is read and is
used to get a complete list of disk access records
that are used as a second-stage bootstrap of the
root disk group, and to locate all other disk
groups. If the plexes of a volume contain different
data, then the plexes are said to be inconsistent.
This is a problem only if LSM is unaware of
the inconsistencies, as the volume can return differing
results for consecutive reads.
Plex inconsistency is a serious compromise of data
integrity. This inconsistency is caused by write
operations that start around the time of a system
failure, if parts of the write complete on one plex
but not the other. If the plexes are not first synchronized
to contain the same data, plexes are
inconsistent after creation of a mirrored volume.
An important role of LSM is to ensure that consistent
data is returned to any application that reads
a volume. This might require that plex consistency
of a volume be "recovered" by copying data between
plexes so that they have the same contents. Alternatively,
you can put the volume into a state so
that reads from one plex are automatically written
back to the other plexes, thus making the data consistent
for that volume offset.
The following conventions are available for LSM commands
to provide a finer degree of administration.
Command Syntax [Toc] [Back]
Most LSM commands provide more than one operation, with
operations grouped primarily by object type. Commands that
provide multiple operations are typically invoked with the
following form: command [options] [keyword] [operands]
Here, command is the name of the command and keyword is a
name that identifies the specific operation to perform.
Any options introduced in the standard -letter form precede
the operation keyword.
To aid normal use, each command provides an extended usage
message that lists the options and operation keywords it
supports. For commands that are keyword-based, the
extended usage message can be displayed by using the help
keyword. For commands that use operands for purposes other
than operation selection, the extended usage message can
be displayed by using the -H option. The extended usage
messages are reminders, not replacements for user documentation.
Standard Length Numbers [Toc] [Back]
Many basic properties of objects managed by LSM require
specification of lengths, either as a pure object length
or as an offset relative to some other object. LSM supports
volume lengths up through 2,147,483,647 disk sectors
(one terabyte on most systems). Typing such large numbers,
or even much smaller numbers, can be annoying and subject
to error. LSM provides a uniform syntax for representing
such numbers, which uses suffixes to provide convenient
multipliers. Numbers can be specified in decimal, octal,
or hexadecimal values. Also, numbers can be specified as a
sum of several numbers.
A hexadecimal (base 16) number is introduced using a prefix
of 0x. For example, 0xfff is the same as decimal 4095.
An octal (base 8) number is introduced using a prefix of
0. For example, 0177777 is the same as decimal 65535.
A number can be followed by a suffix character to indicate
a multiplier for the number. A length number with no suffix
character represents a count of standard disk sectors.
The length of a standard disk sector can vary between systems;
it is commonly 512 bytes. On systems where disks can
have different sector sizes, one of the sector sizes will
be chosen as the "standard" size. Supported suffix characters
are:
Multiply the length by 512 bytes (blocks) Multiply the
length by the standard sectors size (default) Multiply the
length by 1024 bytes for kilobytes Multiply the length by
1,048,576 (1024K bytes) for megabytes Multiply the length
by 1,073,741,824 (1024MB) for gigabytes Multiply the
length by 1,099,511,627,776 (1024GB) for terabytes
Numbers are represented internally as an integer number of
sectors. As a result, if the standard disk sector size is
larger than 512 bytes, numbers will be rounded down to the
nearest multiple of the specified number of sectors.
Rounding is always done to the next lowest, not the nearest,
multiple of the sector size.
The letter b is a valid hexadecimal character. To use b to
indicate a length in blocks, leave a single space between
the length and the b suffix. Use of a blank character
within a number, when invoking commands from the shell,
usually requires enclosing the number in quotes. For example:
/sbin/volassist make vol01 "0x1000 b"
Numbers can be added or subtracted by separating two or
more numbers by a plus or minus sign, respectively. A plus
sign is optional. For example, the largest allowed number
that can be represented on a system with a 512 byte sector
size can be entered as: 1023g+1023m+1023k+1
The number 2g-1 can be used to represent the largest volume
size that can be used with most file systems.
In output, LSM reports length numbers as a simple count of
sectors, with no suffix character.
Case is not important in length specification. Hexadecimal
numbers and suffix characters can be specified using any
reasonable combination of uppercase and lowercase letters.
Disk Group Selection [Toc] [Back]
Most commands operate upon only one disk group. Each disk
group has a separate configuration from every other disk
group. It is possible for two disk groups to contain
objects that have the same name. This can happen if a disk
group is moved from one system to another. However, most
utilities make no attempt to ensure that names between
disk groups are unique, so name collisions can occur anyway.
In general you specify disk groups only when creating
objects. You cannot use a single command that references
objects in more than one disk group, but disk groups are
selected automatically, based on objects specified in the
command.
The standard rules most commands use for selecting the
disk group for a command are as follows: Given a particular
set of object names specified on the command line,
look for the disk group of each object. If all objects are
in the same disk group, use that disk group. If any named
object is not unique in all disk groups, and if one named
object is not in the rootdg disk group, then fail. To
force use of a particular disk group, use -g diskgroup to
indicate the group. Names do not cause errors when a disk
group is specified explicitly. The diskgroup specification
is either a disk group ID or a disk group name.
Exception: Any set of objects in the rootdg disk
group can be specified without specifying -g
rootdg, even if the given object name is used in
another disk group.
If a set of object names is given on the command line, and
if some are unique but some are not unique, then the
command will fail according to the preceding rules.
Disk group configurations contain six types of records:
volume records, plex records, subdisk records, disk media
records, disk group records, and disk access records. Each
of these record types is described in the following sections.
Disk access records are specific to the root disk
group and are stored in configurations only because there
is no other convenient place to store them; otherwise,
they are logically separate from all disk groups. Since
they are specific and meaningful only to the local system,
the logical place for their storage is the rootdg because
that is the only disk group guaranteed to exist on the
system.
Disk Group Records [Toc] [Back]
Disk group records define several different types of names
for a disk group. The different types of names are: The
name of the disk group, as defined on disk. This name is
stored in the disk group configuration and is also stored
in the disk headers of disks in the disk group. The standard
name that the system uses when referencing the disk
group. References to the disk group name usually mean the
alias name. Volume directories are structured into subdirectories
based on the disk group alias name. Typically,
the disk group's alias name and real name are identical. A
local alias can be useful for gaining access to a disk
group with a name that conflicts with other disk groups in
the system or that conflicts with records in the rootdg
disk group. A 64-byte identifier that represents the
unique ID of the disk group. All disk groups on all systems
should have a unique disk group ID, even if they have
the same real name. This identifier is stored in the disk
headers of disks in the disk group that have a private
region. It is used to ensure that LSM does not confuse two
disk groups that were created with the same name.
Volume Records [Toc] [Back]
Volume records define the characteristics of volume
devices. The name of a volume record defines the node name
used for files in the /dev/vol and /dev/rvol directories.
The block device for a volume (which can be used as an
argument to the mount command has the path:
/dev/vol/groupname/volume
where groupname is the name of the disk group containing
the volume. The raw device for a volume, typically used
for application I/O and for issuing I/O control operations
has the path:
/dev/rvol/groupname/volume
For convenience, volumes assigned to the root disk group
are accessible under the rootdg subdirectories of the
/dev/vol and /dev/rvol directories, but are also accessible
under the /dev/vol/volume and /dev/rvol/volume directories.
Reads from a volume are directed to one of the read-write
or read-only plexes associated with the volume. Writes to
the volume are directed to the enabled read-write and
write-only plexes associated with the volume.
During a write operation, two plexes of a volume can
become out of sync with each other, because writes
directed to two disks can complete at different times.
This is not normally a problem. However, if the system
were to crash or lose power during a write operation, the
two plexes could have different contents.
Most applications and file systems are not designed with
the presumption that two separate reads of a device can
return different contents without an intervening write
operation. Because plexes with different contents could
cause such a situation, LSM expends considerable effort to
guarantee that this does not happen.
Volumes have the following fundamental attributes: Defines
a class of rules for operating on the volume, typically
based on the expected content of the volume. Several utilities
can apply extensions or limitations that apply to
volumes with a particular usage type. Several usage types
are included with the base release of LSM: fsgen, for use
with volumes that contain file systems; gen, for use with
volumes that are used as swap devices or for other applications
that do not use the system buffer cache; raid5 for
use with volumes that have a RAID 5 plex layout, regardless
of what the volume is used for; and the following
special usage types: root, for use with the root file system
volume on a single system; cluroot, for use with the
cluster_root domain volume on a cluster; and swap, for use
with the primary swap device on a single system and swap
devices for cluster members. Usage types maintain a private
state field related to the volume that records operations
that have been performed on the volume or failure
conditions that have been encountered. This state field
contains a string of up through 14 characters. Each volume
has a length, which defines the limiting offset of
read and write operations. The length is assigned by the
administrator and might or might not match the lengths of
the associated plexes. Each volume is either enabled,
disabled, or detached. When enabled, normal read and write
operations are allowed on the volume, and any file system
residing on the volume can be mounted or used in the usual
way. When disabled, no access to the volume or any of its
associated plexes is allowed. When detached, some ioctl
calls can be used by commands to operate on the volume.
Each volume has zero through 32 associated plexes. A configurable
policy for switching between plexes for volume
reads. When a volume has more than one enabled associated
plex, LSM can distribute reads between the plexes to distribute
the I/O load and thus increase total possible
bandwidth of reads through the volume.
You can set and change the read policy to one of
the following: For every other read operation,
switches to a different plex from the previous read
operation. Given three plexes, this switches
between each of the three plexes, in order. Specifies
a plex used to satisfy read requests. In the
event that a read request cannot be satisfied by
the preferred plex, the volume changes to roundrobin
read policy. The default policy. Adjusts to
use an appropriate read policy based on the set of
plexes associated with the volume. If only one
enabled read-write striped plex is associated with
the volume, then that plex is chosen automatically
as the preferred plex; otherwise, the round-robin
policy is used. If a volume has one striped plex
and one concatenated plex, preferring the striped
plex often yields better throughput. A string
organized as a set of usage-type options to apply
when starting (enabling) a volume. See volume(8)
for details. An assignable policy to use for logging
changes to the volume. The policies are: Does
not log any changes when writing to the volume.
Writes the requested data to all read-write or
write-only plexes. Maintains a bitmap that represents
different regions of a mirrored volume. When
a write to a particular region occurs, the respective
bit is set. When the system is restarted after
a crash, this region bitmap is used to limit the
amount of data copying required to recover plex
consistency for the volume. The region changes are
logged to a special log subdisk associated with the
volume. Use of DRL can greatly speed recovery of a
volume, but it might degrade performance of the
volume under normal operation. Stores a copy of
the data and parity for several full stripes of
I/O. When a write to a RAID 5 volume occurs, the
parity is calculated and the data and parity are
first written to the RAID 5 log, then to the volume.
When the system is restarted after a crash,
all the writes in the RAID 5 log are written (or
possibly rewritten) to the volume. The writes are
logged to a special log subdisk associated with a
separate log plex, associated with the volume. Use
of a RAID 5 log protects against data loss in the
event of a system failure. A mode that applies to
the volume during plex consistency recovery. When
this mode is enabled, the data read from blocks of
one plex region is written back to the corresponding
region in all other writable plexes. This
ensures that a future read operation covering the
same range of blocks will return the same data.
Can be enabled or disabled using voledit. If this
mode is enabled, a read failure for a plex causes
data to be read from an alternate plex and then
written back to the plex that had the read failure.
This usually fixes the error. Only if the writeback
fails will the plex be detached for having an unrecoverable
I/O failure. This is the default. Can be
enabled or disabled using voledit. This mode takes
effect only if the DRL feature is in effect. When
the operating system passes a write request to the
volume driver, the operating system might continue
to change the memory being written to disk. LSM
cannot detect that the memory is changing, so it
can inadvertently leave plexes with inconsistent
contents. This is not normally a problem, because
the operating system ensures that any such modified
memory is rewritten to the volume before the volume
is closed (such as by a clean system shutdown).
However, if the system crashes, plexes can be
inconsistent. Because the DRL logging feature prevents
recovery of the entire volume, it might not
ensure that plexes are entirely consistent.
Turning on the writecopy mode (which is normally
set by default) often causes LSM to copy the data
for a write request to a new section of memory
before writing it to disk. Because the write is
done from the copied memory, it cannot change and
so the data written to each plex is guaranteed to
be the same if the write completes. Several modes
can be set on the volume according to its usage
type. These modes affect operation of a volume in
the presence of I/O failures. Only one of these
policies, called GEN_DET_SPARSE is used. This policy
tracks complete and incomplete plexes in a volume.
(An incomplete plex does not have a backing
subdisk for all blocks in the volume.) If an unrecoverable
error occurs on an incomplete plex, the
plex is detached (disabled from receiving regular
volume I/O requests). If an unrecoverable error
occurs on a complete plex, the plex is detached
unless it is the last complete plex, in which case
any incomplete plexes that overlap with the error
will be detached but the plex with the error will
remain attached.
This exception policy is chosen to ensure that an
I/O that fails on one plex will not be directed to
that plex again unless that plex is the last complete
plex remaining attached to the volume. In
that case, the policy ensures that the volume will
return the error consistently, even in the presence
of incomplete plexes. An administrator-assigned
string of up through 40 characters that can be set
and changed using the voledit command. LSM does not
interpret the comment field. The comment cannot
contain newline characters. The user, group, and
file permission modes used for the volume device
nodes. The user and group modes are normally root
and system. The mode usually grants read and write
permission to the owner and no access by other
users.
Plex Records [Toc] [Back]
Plex records define the characteristics of a particular
plex of a volume. A plex can be in either an associated
state or a dissociated state. In the dissociated state,
the plex is not a part of a volume. A dissociated plex
cannot be accessed in any way. An associated plex can be
accessed through the volume.
Plexes have the following fundamental attributes: Each
plex is either enabled, disabled, or detached. When
enabled, normal read and write operations from the volume
can be directed to the plex. When disabled or detached, no
I/O operations can be applied to the plex.
Failures encountered during normal volume I/O can
change the plex state from enabled to detached. See
the preceding description of the volume record
exception policy for more information. Each plex
is in read-write, read-only, or write-only mode.
The I/O mode affects read and write operations
directed to the volume, if the plex is enabled. For
read-write and read-only modes, volume read operations
can be directed to the plex. For read-write
and write-only modes, volume write operations are
directed to the plex.
Plexes are normally in read-write mode. Write-only
mode is used to recover a plex that failed and
whose contents have become out of date with respect
to the volume. It is also used when attaching a new
plex to a volume. In read-write mode, writes to the
volume will update the plex, causing written
regions to be up to date. Typically, a set of special
copy operations is used to update the
remainder of the plex. The organization of associated
subdisks with respect to the plex address
space. The layout is striped, concatenated, or RAID
5. Each plex can have zero or more associated subdisks.
Subdisks are associated at offsets relative
to the beginning of the plex address space. Subdisks
for concatenated plexes might not cover the
entire length of the plex, in which case they leave
holes in the plex. A plex that is not as long as
the associated volume is considered to have a hole
extending from the end of the plex to the end of
the volume. A plex with a hole is considered incomplete
and is sometimes called sparse. Each plex
can have one associated log subdisk. A log subdisk
is used with the DRL feature to reduce the time
required to recover consistency of a volume after a
system failure. If a plex is associated with a log
subdisk, that plex is a log plex. The length of a
plex is the offset of the last subdisk in the plex
plus the length of that subdisk. In other words,
the length of the plex is defined by the last block
in the plex address space that is backed by a subdisk.
This value might not relate to the length of
the volume, depending on whether the plex is completely
contiguously allocated. The offset of the
first block in the plex address space that is not
backed by a subdisk. If the plex has no holes, the
contiguous length matches the plex length. If the
contiguous length is equal to or greater than the
length of the associated volume, the plex is considered
complete; otherwise it is incomplete. Volume
usage types maintain a private state field
related to the operations that have been performed
on the plex or to failure conditions that have been
encountered. This state field contains a string of
up through 14 characters. Various condition flags
are defined for the plex that LSM sets and changes
independent of the volume usage type. Defined
flags are: No physical disk could be found corresponding
to the disk ID in the disk media record
for one of the subdisks associated with the plex.
The plex cannot be used until the condition is
fixed or the affected subdisk is dissociated. A
disk media record was put into the removed state
through explicit administrative action. The plex
cannot be used until the disk is replaced or the
affected subdisk is dissociated. A disk for a disk
media record was replaced or was reattached too
late to prevent the plex from becoming out of date
with respect to the volume. The plex requires complete
recovery from another plex in the volume to
synchronize the plex with the correct contents of
the volume. The plex was detached when an I/O
failure was detected during normal volume I/O. The
plex is out of date with respect to the volume and
in need of complete recovery. However, this condition
can also indicate that a disk in the system
should be replaced. A plex is considered to have
"volatile" contents if the disk for any of the
plex's subdisks is considered to be volatile. The
contents of a volatile disk are not presumed to
survive a system reboot. The contents of a
volatile plex are always considered out of date
after a recovery and in need of complete recovery
from another plex. An administrator-assigned
string of up through 40 characters that can be set
and changed using the voledit command. LSM does not
interpret the comment field. The comment cannot
contain newline characters.
Subdisk Records [Toc] [Back]
Subdisk records define a region of disk, allocated from a
disk's public region. Subdisks have few states associated
with them, other than the configuration state that defines
which region of disk the subdisk occupies. Subdisks cannot
overlap each other, either in their associations with
plexes or in their arrangement on disk public regions.
Subdisks have the following fundamental attributes: The
name of the disk media record that points to the physical
disk. The offset from the beginning of the disk's public
region to the start of the subdisk. For associated subdisks,
this is the offset (from the beginning of the plex)
of the subdisk association. For subdisks associated with
striped plexes, the plex offset defines relative ordering
of subdisks in the plex, rather than actual offsets within
the plex address space. The length of the subdisk. An
administrator-assigned string of up through 40 characters
that can be set and changed using the voledit command. LSM
does not interpret the comment field. The comment cannot
contain newline characters.
Disk Media Records [Toc] [Back]
Disk media records define a specific disk within a disk
group. The name of a disk media record (the disk media
name) is assigned when a disk is first added to a disk
group. Disk media records can be assigned to specific
physical disks by associating the disk media record with
the current disk access record for the physical disk.
Disk media records have the following fundamental
attributes: A 64-byte unique identifier assigned to the
physical disk associated with the disk media record. This
can be cleared to indicate that the disk is in the REMOVED
state. A removed disk has no current association with any
physical disk. The disk access name currently used to
access the physical disk referenced by the disk ID. If the
disk ID is defined, but no physical disk with that ID can
be found, the disk access name will be null. If the physical
disk is not found, the disk state is NODAREC, or inaccessible.
A disk can become inaccessible either because
the indicated disk is not currently attached to the system
or because I/O failures on the physical disk prevented LSM
from identifying or using the physical disk.
A disk media record that has an active association with a
physical disk (both the disk ID and the disk access name
attributes are defined) inherits several properties from
the underlying physical disk. These attributes are taken
from the disk header, which is stored in the private
region of the disk. These inherited attributes are: The
length of the region of the physical disk available for
subdisk allocations. The length of the region of the
physical disk reserved for storing private Logical Storage
Manager information. The fundamental I/O size for the
disk, in bytes, also known as the sector size. All I/Os
destined for this disk must be multiples of this size. LSM
requires that all disks have the same sector size. On
Tru64 UNIX systems, the sector size is 512 bytes.
Disk Access Records [Toc] [Back]
Disk access records define an address, or access path, for
a disk. LSM uses the disk access records to locate physical
disks. Disk access records do not define specific
physical disks, because physical disks can be moved on a
system. When a physical disk is moved, a different disk
access record might be necessary to locate it.
Disk access records are stored in the rootdg disk group
configuration. Unlike other record types, the names of
disk access records can conflict with the names of other
records. For example, a specialty disk (such as a RAM
disk) can use the same name for both the disk access
record and the disk media record that points to it.
Disk access records can be defined explicitly. Some (sometimes
all) disk access records might be configured automatically
by LSM, based on available information in the
operating system. Such automatically configured disks are
not stored persistently in the on-disk root disk group
configuration, but instead are regenerated every time LSM
starts.
Disk access records have the following fundamental
attributes: The name of the disk access record is typically
a disk address of some kind. Disk names are usually
of the form dsknp, where dsk is the device mnenomic for
disk devices, n is the sequence number of the disk, and p
is the partition identifier (in the range a through h).
Each disk access record has a type, which identifies certain
key characteristics of LSM's interaction with the
disk. Available types are: sliced, simple, and nopriv.
See voldisk(8) for more information on disk types. Typically,
most or all of the disks will be of type sliced. It
might be desirable to create specialty disks (such as RAM
disks) with type nopriv.
If the physical disk represented by the disk access record
is currently associated with a disk media record, then the
following fields are defined: The name of the disk group
containing the disk media record. The name of the disk
media record that points to the physical disk.
Additional attributes can be added, arbitrarily, by disk
types. See voldisk(8) for a list of additional attributes
defined by the standard disk types.
The usage type of a volume represents a class of rules for
operating on a volume. Each usage type is defined by a set
of executables under the directory /sbin/lsm.d/usage_type,
where usage_type is the name given to the usage type. The
required executables are: volinfo, volmake, volmend, volplex,
volsd, and volume. These executables are invoked by
LSM administrative utilities with the same names. The executables
under /sbin/lsm.d/usage_type should not, normally,
be executed directly.
The usage types provided with LSM are: gen, fsgen, root,
cluroot, swap, and raid5. It is likely that new usage
types will be added in future releases. It is also possible
for third-party products to install additional usage
types.
The usage types provided with LSM store state information
in the volume and plex usage-type state fields.
The volume states are: The volume is not yet initialized.
This is the initial state for volumes created by volmake.
The volume has been stopped and the contents for all
plexes are consistent. The volume has been started and is
running normally or was running normally when the system
was stopped. If the system crashes in this state, then the
volume might require plex consistency recovery. The volume
requires recovery. A volume is typically set to this
state after a system failure to indicate that the plexes
in the volume might be inconsistent and require recovery.
(See the resync operation in volume(8).) Plex consistency
recovery is currently being done on the volume. The volume
resync operation sets this state when it starts to recover
plex consistency on a volume that was in the NEEDSYNC
state.
The plex states are: The plex is not yet initialized.
This state is set when the volume state is also EMPTY.
The plex was running normally when the volume was stopped.
The plex will be enabled without requiring recovery when
the volume is started. The plex is running normally on a
started volume. The plex condition flags (NODAREC,
REMOVED, RECOVER, and IOFAIL) can apply if the system is
rebooted and the volume restarted. The plex was detached,
either by a volplex det operation or by an I/O failure.
The volume start operation will change the state for a
plex to STALE if any of the plex condition flags are set.
STALE plexes will be reattached automatically when a volume
is started. The plex was disabled explicitly by the
volmend off operation. See volmend(8) for more information.
Applies to a snapshot plex that is being attached
by the volassist snapstart operation. When the attach is
complete, the state for the plex will be changed to SNAPDONE.
If the system fails before the attach completes, the
plex and all of its subdisks will be removed. Applies to
a snapshot plex created by the volassist snapstart operation
that is fully attached. A plex in this state can be
turned into a snapshot volume with the volassist snapshot
operation. See volassist(8) for more information. If the
system fails before the attach completes, the plex and all
of its subdisks will be removed. Applies to a snapshot
plex being attached by the volplex snapstart operation.
When the attach is complete, the state for the plex will
be changed to SNAPDIS. If the system fails before the
attach completes, the plex will be dissociated from the
volume. Applies to a snapshot plex created by a volplex
snapstart operation that is fully attached. A plex in this
state can be turned into a snapshot volume with the volplex
snapshot operation. See volplex(8) for more information.
If the system fails before the attach completes,
the plex will be dissociated from the volume. Applies to
a plex that is being associated and attached to a volume
with the volplex att operation. If the system fails before
the attach completes, the plex will be dissociated from
the volume. Applies to a plex that is being associated
and attached to a volume with the volplex att operation.
If the system fails before the attach completes, the plex
will be dissociated from the volume and removed. Any subdisks
in the plex will be kept. Applies to a plex that is
being associated and attached to a volume with the volplex
att operation. If the system fails before the attach completes,
the plex and its subdisks will be dissociated from
the volume and removed.
The majority of LSM utilities use a common set of exit
codes, which can be used by shell scripts or other types
of programs to react to specific problems detected by the
utilities. For C programmers, these exit status codes are
defined in the include file volclient.h. The number and
macro name for each distinct exit code is described in the
following list. Shell script writers must directly compare
the numbers specified. The command is not reporting
any error through the exit code. Some command-line arguments
were invalid. A syntax error occurred in a command
line or description, or a specified record name is too
long or contains invalid characters. This code is returned
only by utilities that implement a command or description
language. This code can also be returned for errors in
search patterns. The volume daemon might not be running.
An unexpected error was encountered while communicating
with the volume daemon. An unexpected error was returned
by a system call or by the C library. This can also indicate
that the command ran out of memory. The status for a
commit was lost because the volume daemon was killed and
restarted during the commit of a transaction, but after
restart the volume daemon did not know whether the commit
succeeded or failed. The command encountered an error
that it should not have encountered. This generally
implies a condition that the command should have tested
for but did not or a condition that results from the volume
daemon returning a value that did not make sense.
VEX_UNKNOWN: An unknown or internal error was
encountered. This code can be used, for example,
when the volume daemon returns an unrecognized
error number. The time required to complete a
transaction exceeded 60 seconds, causing the transaction
locks to be lost. Because most utilities
will reattempt the transaction at least once if a
timeout occurs, this usually implies that a transaction
timed out two or more times. No disk group
could be identified for an operation. This results
either from specifying a disk group that does not
exist or from supplying names on a command line
that are in different disk groups or in multiple
disk groups. A change made to the database by
another process caused the command to stop. This
code is also returned by a usage-type-dependent
command if it is given a record that has a different
usage type. A requested subdisk, plex, or volume
record was not found in the configuration
database. This can also mean that a record was an
inappropriate type. A name used to create a new
configuration record matches the name of an existing
record. A subdisk, plex, or volume is locked
against concurrent access. This code is used for
intertransaction locks associated with usage-type
utilities. The code is also used for the dissociated-plex
or subdisk lock convention, which writes
a nonblank string to the tutil[0] field in a plex
or subdisk structure to indicate that the record is
being used. No usage type could be determined for
a command that requires a usage type. An invalid
usage type was specified. A plex or subdisk is
associated, but the operation requires a dissociated
record. A plex or subdisk is dissociated, but
the operation requires an associated record. This
code can also be used to indicate that a subdisk or
plex is not associated with a specific plex or volume.
A plex or subdisk was not dissociated,
because it was the last record associated with a
volume or plex. Association of a plex or subdisk
would surpass the maximum number that can be associated
with a volume or plex. A specified operation
is invalid within the parameters specified.
An I/O error was encountered that caused the operation
to abort. A volume involved in an operation
did not have any associated plexes, although at
least one was required. A plex involved in an
operation did not have any associated subdisks,
although at least one was required. A volume could
not be started by the volume start operation,
because the configuration of the volume and its
plexes prevented the operation. A specified volume
was already started. A specified volume was not
started. For example, this code is returned by the
volume stop operation, if the operation is given a
volume that is not started. A volume or plex
involved in an operation is in the detached state,
thus preventing a successful operation. A volume
or plex involved in an operation is in the disabled
state, thus preventing a successful operation. A
volume or plex involved in an operation is in the
enabled state, thus preventing a successful operation.
An unrecognized error was encountered. This
code is currently unused. An operation failed
because a volume device was open or mounted or
because a subdisk was associated with an open or
mounted volume or plex.
Exit codes 32 through 64 are reserved for use by usage
types. Codes greater than 64 can be reserved for use by
specific utilities.
Commands: mount(8), volassist(8), volclonedg(8), vold(8),
voldctl(8), voldg(8), voldisk(8), voldiskadd(8), voldiskadm(8), voldisksetup(8), voledit(8), volencap(8), volevac(8), volinfo(8), volinstall(8), voliod(8), vollogcnvt(8), volmake(8), volmend(8), volmigrate(8), volmirror(8), volnotify(8), volplex(8), volprint(8), volreattach(8), volrecover(8), volreconfig(8), volrestore(8),
volrootmir(8), volsave(8), volsd(8), volsetup(8), volstat(8), voltrace(8), volume(8), volunmigrate(8), volunroot(8), volwatch(8)
Functions: ioctl(2)
Files: vol_pattern(4), volmake(4)
volintro(8)
[ Back ] |