raidctl - configuration utility for the RAIDframe disk driver
raidctl [-v] [-afFgrR component] [-BGipPsSu] [-cC
config_file]
[-A [yes | no | root]] [-I serial_number] dev
raidctl is the user-land control program for raid(4), the
RAIDframe disk
device. raidctl is primarily used to dynamically configure
and unconfigure
RAIDframe disk devices. For more information about the
RAIDframe
disk device, see raid(4).
This document assumes the reader has at least rudimentary
knowledge of
RAID and RAID concepts.
The device used by raidctl is specified by dev. dev may be
either the
full name of the device, e.g. /dev/rraid0c, or just simply
raid0 (for
/dev/rraid0c).
For several commands (-BGipPsSu), raidctl can accept the
word all as the
dev argument. If all is used, raidctl will execute the requested action
for all the configured raid(4) devices.
The command-line options for raidctl are as follows:
-a component dev
Add component as a hot spare for the device dev.
-A yes dev
Make the RAID set auto-configurable. The RAID set
will be automatically
configured at boot before the root file
system is
mounted. Note that all components of the set must
be of type
RAID in the disklabel.
-A no dev
Turn off auto-configuration for the RAID set.
-A root dev
Make the RAID set auto-configurable, and also mark
the set as being
eligible to contain the root partition. A RAID
set configured
this way will override the use of the boot disk
as the root
device. All components of the set must be of type
RAID in the
disklabel. Note that the kernel being booted must
currently reside
on a non-RAID set and, in order to have the
root file system
correctly mounted from it, the RAID set must have
its `a' partition
(aka raid[0..n]a) set up.
-B dev Initiate a copyback of reconstructed data from a
spare disk to
its original disk. This is performed after a component has
failed, and the failed drive has been reconstructed
onto a spare
drive.
-c config_file dev
Configure the RAIDframe device dev according to the
configuration
given in config_file. A description of the contents
of
config_file is given later.
-C config_file dev
As for -c, but forces the configuration to take
place. This is
required the first time a RAID set is configured.
-f component dev
This marks the specified component as having failed,
but does not
initiate a reconstruction of that component.
-F component dev
Fails the specified component of the device, and immediately begin
a reconstruction of the failed disk onto an
available hot
spare. This is one of the mechanisms used to start
the reconstruction
process if a component does have a hardware failure.
-g component dev
Get the component label for the specified component.
-G dev Generate the configuration of the RAIDframe device
in a format
suitable for use with raidctl -c or -C.
-i dev Initialize the RAID device. In particular, (rewrite) the parity
on the selected device. This MUST be done for all
RAID sets before
the RAID device is labeled and before file systems are created
on the RAID device.
-I serial_number dev
Initialize the component labels on each component of
the device.
serial_number is used as one of the keys in determining whether a
particular set of components belong to the same RAID
set. While
not strictly enforced, different serial numbers
should be used
for different RAID sets. This step MUST be performed when a new
RAID set is created.
-p dev Check the status of the parity on the RAID set.
Displays a status
message, and returns successfully if the parity
is up-todate.
-P dev Check the status of the parity on the RAID set, and
initialize
(re-write) the parity if the parity is not known to
be up-todate.
This is normally used after a system crash
(and before a
fsck(8)) to ensure the integrity of the parity.
-r component dev
Remove the spare disk specified by component from
the set of
available spare components.
-R component dev
Fails the specified component, if necessary, and immediately begins
a reconstruction back to component. This is
useful for reconstructing
back onto a component after it has been
replaced
following a failure.
-s dev Display the status of the RAIDframe device for each
of the components
and spares.
-S dev Check the status of parity re-writing, component reconstruction,
and component copyback. The output indicates the
amount of
progress achieved in each of these areas.
-u dev Unconfigure the RAIDframe device.
-v Be more verbose. For operations such as reconstructions, parity
re-writing, and copybacks, provide a progress indicator.
Configuration file [Toc] [Back]
The format of the configuration file is complex, and only an
abbreviated
treatment is given here. In the configuration files, a `#'
indicates the
beginning of a comment.
There are 4 required sections of a configuration file, and 2
optional
sections. Each section begins with a `START', followed by
the section
name, and the configuration parameters associated with that
section. The
first section is the `array' section, and it specifies the
number of
rows, columns, and spare disks in the RAID set. For example:
START array
1 3 0
indicates an array with 1 row, 3 columns, and 0 spare disks.
Note that
although multi-dimensional arrays may be specified, they are
NOT supported
in the driver.
The second section, the `disks' section, specifies the actual components
of the device. For example:
START disks
/dev/sd0e
/dev/sd1e
/dev/sd2e
specifies the three component disks to be used in the RAID
device. If
any of the specified drives cannot be found when the RAID
device is configured,
then they will be marked as `failed', and the system will operate
in degraded mode. Note that it is imperative that the
order of the
components in the configuration file does not change between
configurations
of a RAID device. Changing the order of the components will result
in data loss if the set is configured with the -C option.
In normal circumstances,
the RAID set will not configure if only -c is
specified, and
the components are out-of-order.
The next section, which is the `spare' section, is optional,
and, if present,
specifies the devices to be used as `hot spares' --
devices which
are on-line, but are not actively used by the RAID driver
unless one of
the main components fail. A simple `spare' section might
be:
START spare
/dev/sd3e
for a configuration with a single spare component. If no
spare drives
are to be used in the configuration, then the `spare' section may be
omitted.
The next section is the `layout' section. This section describes the
general layout parameters for the RAID device, and provides
such information
as sectors per stripe unit, stripe units per parity
unit, stripe
units per reconstruction unit, and the parity configuration
to use. This
section might look like:
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
32 1 1 5
The sectors per stripe unit specifies, in blocks, the interleave factor;
i.e. the number of contiguous sectors to be written to each
component for
a single stripe. Appropriate selection of this value (32 in
this example)
is the subject of much research in RAID architectures.
The stripe
units per parity unit and stripe units per reconstruction
unit are normally
each set to 1. While certain values above 1 are permitted, a discussion
of valid values and the consequences of using anything other than
1 are outside the scope of this document. The last value in
this section
(5 in this example) indicates the parity configuration desired. Valid
entries include:
0 RAID level 0. No parity, only simple striping.
1 RAID level 1. Mirroring. The parity is the mirror.
4 RAID level 4. Striping across components, with parity
stored on
the last component.
5 RAID level 5. Striping across components, parity distributed
across all components.
There are other valid entries here, including those for
Even-Odd parity,
RAID level 5 with rotated sparing, Chained declustering, and
Interleaved
declustering, but as of this writing the code for those parity operations
has not been tested with OpenBSD.
The next required section is the `queue' section. This is
most often
specified as:
START queue
fifo 100
where the queuing method is specified as FIFO (First-In,
First-Out), and
the size of the per-component queue is limited to 100 requests. Other
queuing methods may also be specified, but a discussion of
them is beyond
the scope of this document.
The final section, the `debug' section, is optional. For
more details on
this the reader is referred to the RAIDframe documentation
discussed in
the HISTORY section. See EXAMPLES for a more complete configuration file
example.
It is highly recommended that before using the RAID driver
for real file
systems that the system administrator(s) become quite familiar with the
use of raidctl, and that they understand how the component
reconstruction
process works. The examples in this section will focus on
configuring a
number of different RAID sets of varying degrees of redundancy. By working
through these examples, administrators should be able to
develop a
good feel for how to configure a RAID set, and how to initiate reconstruction
of failed components.
In the following examples `raid0' will be used to denote the
RAID device.
`/dev/rraid0c' may be used in place of `raid0'.
Initialization and Configuration [Toc] [Back]
The initial step in configuring a RAID set is to identify
the components
that will be used in the RAID set. All components should be
the same
size. Each component should have a disklabel type of
FS_RAID, and a typical
disklabel entry for a RAID component might look like:
f: 1800000 200495 RAID # (Cyl.
405*- 4041*)
While FS_BSDFFS (e.g. 4.2BSD) will also work as the component type, the
type FS_RAID (e.g. RAID) is preferred for RAIDframe use, as
it is required
for features such as auto-configuration. As part of
the initial
configuration of each RAID set, each component will be given
a `component
label'. A `component label' contains important information
about the
component, including a user-specified serial number, the row
and column
of that component in the RAID set, the redundancy level of
the RAID set,
a 'modification counter', and whether the parity information
(if any) on
that component is known to be correct. Component labels are
an integral
part of the RAID set, since they are used to ensure that
components are
configured in the correct order, and used to keep track of
other vital
information about the RAID set. Component labels are also
required for
the auto-detection and auto-configuration of RAID sets at
boot time. For
a component label to be considered valid, that particular
component label
must be in agreement with the other component labels in the
set. For example,
the serial number, `modification counter', number of
rows and number
of columns must all be in agreement. If any of these
are different,
then the component is not considered to be part of the set.
See raid(4)
for more information about component labels.
Once the components have been identified, and the disks have
appropriate
labels, raidctl is then used to configure the raid(4) device. To configure
the device, a configuration file which looks something
like:
START array
# numRow numCol numSpare
1 3 1
START disks
/dev/sd1e
/dev/sd2e
/dev/sd3e
START spare
/dev/sd4e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
32 1 1 5
START queue
fifo 100
is created in a file. The above configuration file specifies a RAID 5
set consisting of the components /dev/sd1e, /dev/sd2e, and
/dev/sd3e,
with /dev/sd4e available as a `hot spare' in case one of the
three main
drives should fail. A RAID 0 set would be specified in a
similar way:
START array
# numRow numCol numSpare
1 4 0
START disks
/dev/sd10e
/dev/sd11e
/dev/sd12e
/dev/sd13e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
64 1 1 0
START queue
fifo 100
In this case, devices /dev/sd10e, /dev/sd11e, /dev/sd12e,
and /dev/sd13e
are the components that make up this RAID set. Note that
there are no
hot spares for a RAID 0 set, since there is no way to recover data if any
of the components fail.
For a RAID 1 (mirror) set, the following configuration might
be used:
START array
# numRow numCol numSpare
1 2 0
START disks
/dev/sd20e
/dev/sd21e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
128 1 1 1
START queue
fifo 100
In this case, /dev/sd20e and /dev/sd21e are the two components of the
mirror set. While no hot spares have been specified in this
configuration,
they easily could be, just as they were specified in
the RAID 5
case above. Note as well that RAID 1 sets are currently
limited to only
2 components. At present, n-way mirroring is not possible.
The first time a RAID set is configured, the -C option must
be used:
# raidctl -C raid0.conf raid0
where `raid0.conf' is the name of the RAID configuration
file. The -C
forces the configuration to succeed, even if any of the component labels
are incorrect. The -C option should not be used lightly in
situations
other than initial configurations, as if the system is refusing to configure
a RAID set, there is probably a very good reason for
it. After
the initial configuration is done (and appropriate component
labels are
added with the -I option) then raid0 can be configured normally with:
# raidctl -c raid0.conf raid0
When the RAID set is configured for the first time, it is
necessary to
initialize the component labels, and to initialize the parity on the RAID
set. Initializing the component labels is done with:
# raidctl -I 112341 raid0
where `112341' is a user-specified serial number for the
RAID set. This
initialization step is required for all RAID sets. Also,
using different
serial numbers between RAID sets is strongly encouraged, as
using the
same serial number for all RAID sets will only serve to decrease the usefulness
of the component label checking.
Initializing the RAID set is done via the -i option. This
initialization
MUST be done for all RAID sets, since among other things it
verifies that
the parity (if any) on the RAID set is correct. Since this
initialization
may be quite time-consuming, the -v option may be also
used in conjunction
with -i:
# raidctl -iv raid0
This will give more verbose output on the status of the initialization:
Initiating re-write of parity
Parity Re-write status:
10% |**** | ETA:
06:03 /
The output provides a `Percent Complete' in both a numeric
and graphical
format, as well as an estimated time to completion of the
operation.
Since it is the parity that provides the `redundancy' part
of RAID, it is
critical that the parity is correct as much as possible. If
the parity
is not correct, then there is no guarantee that data will
not be lost if
a component fails.
Once the parity is known to be correct, it is then safe to
perform
disklabel(8), newfs(8), or fsck(8) on the device or its
filesystems, and
then to mount the filesystems for use.
Under certain circumstances (e.g. the additional component
has not arrived,
or data is being migrated off of a disk destined to
become a component)
it may be desirable to configure a RAID 1 set with
only a single
component. This can be achieved by configuring the set with
a physically
existing component (as either the first or second component)
and with a
`fake' component. In the following:
START array
# numRow numCol numSpare
1 2 0
START disks
/dev/sd6e
/dev/sd0e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
128 1 1 1
START queue
fifo 100
/dev/sd0e is the real component, and will be the second disk
of a RAID 1
set. The component /dev/sd6e, which must exist, but have no
physical device
associated with it, is simply used as a placeholder.
Configuration
(using -C and -I 12345 as above) proceeds normally, but initialization of
the RAID set will have to wait until all physical components
are present.
After configuration, this set can be used normally, but will
be operating
in degraded mode. Once a second physical component is obtained, it can
be hot-added, the existing data mirrored, and normal operation resumed.
Maintenance of the RAID set [Toc] [Back]
After the parity has been initialized for the first time,
the command:
# raidctl -p raid0
can be used to check the current status of the parity. To
check the parity
and rebuild it necessary (for example, after an unclean
shutdown) the
command:
# raidctl -P raid0
is used. Note that re-writing the parity can be done while
other operations
on the RAID set are taking place (e.g. while doing an
fsck(8) on a
file system on the RAID set). However: for maximum effectiveness of the
RAID set, the parity should be known to be correct before
any data on the
set is modified.
To see how the RAID set is doing, the following command can
be used to
show the RAID set's status:
# raidctl -s raid0
The output will look something like:
Components:
/dev/sd1e: optimal
/dev/sd2e: optimal
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
This indicates that all is well with the RAID set. Of importance here
are the component lines which read `optimal', and the `Parity status'
line which indicates that the parity is up-to-date. Note
that if there
are file systems open on the RAID set, the individual components will not
be `clean' but the set as a whole can still be clean.
The -v option may be also used in conjunction with -s:
# raidctl -sv raid0
In this case, the components' label information (see the -g
option) will
be given as well:
Components:
/dev/sd1e: optimal
/dev/sd2e: optimal
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
Component label for /dev/sd1e:
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Component label for /dev/sd2e:
Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Component label for /dev/sd3e:
Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
To check the component label of /dev/sd1e, the following is
used:
# raidctl -g /dev/sd1e raid0
The output of this command will look something like:
Component label for /dev/sd1e:
Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
Version: 2 Serial Number: 13432 Mod Counter: 65
Clean: No Status: 0
sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
RAID Level: 5 blocksize: 512 numBlocks: 1799936
Autoconfig: No
Last configured as: raid0
Dealing with Component Failures [Toc] [Back]
If for some reason (perhaps to test reconstruction) it is
necessary to
pretend a drive has failed, the following will perform that
function:
# raidctl -f /dev/sd2e raid0
The system will then be performing all operations in degraded mode, where
missing data is re-computed from existing data and the parity. In this
case, obtaining the status of raid0 will return (in part):
Components:
/dev/sd1e: optimal
/dev/sd2e: failed
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
Note that with the use of -f a reconstruction has not been
started. To
both fail the disk and start a reconstruction, the -F option
must be
used:
# raidctl -F /dev/sd2e raid0
The -f option may be used first, and then the -F option used
later, on
the same disk, if desired. Immediately after the reconstruction is
started, the status will report:
Components:
/dev/sd1e: optimal
/dev/sd2e: reconstructing
/dev/sd3e: optimal
Spares:
/dev/sd4e: used_spare
[...]
Parity status: clean
Reconstruction is 10% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
This indicates that a reconstruction is in progress. To
find out how the
reconstruction is progressing the -S option may be used.
This will indicate
the progress in terms of the percentage of the reconstruction that
is completed. When the reconstruction is finished the -s
option will
show:
Components:
/dev/sd1e: optimal
/dev/sd2e: spared
/dev/sd3e: optimal
Spares:
/dev/sd4e: used_spare
[...]
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
At this point there are at least two options. First, if
/dev/sd2e is
known to be good (i.e. the failure was either caused by -f
or -F, or the
failed disk was replaced), then a copyback of the data can
be initiated
with the -B option. In this example, this would copy the
entire contents
of /dev/sd4e to /dev/sd2e. Once the copyback procedure is
complete, the
status of the device would be (in part):
Components:
/dev/sd1e: optimal
/dev/sd2e: optimal
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
and the system is back to normal operation.
The second option after the reconstruction is to simply use
/dev/sd4e in
place of /dev/sd2e in the configuration file. For example,
the configuration
file (in part) might now look like:
START array
1 3 0
START drives
/dev/sd1e
/dev/sd4e
/dev/sd3e
This can be done as /dev/sd4e is completely interchangeable
with
/dev/sd2e at this point. Note that extreme care must be
taken when
changing the order of the drives in a configuration. This
is one of the
few instances where the devices and/or their orderings can
be changed
without loss of data! In general, the ordering of components in a configuration
file should never be changed.
If a component fails and there are no hot spares available
on-line, the
status of the RAID set might (in part) look like:
Components:
/dev/sd1e: optimal
/dev/sd2e: failed
/dev/sd3e: optimal
No spares.
In this case there are a number of options. The first option is to add a
hot spare using:
# raidctl -a /dev/sd4e raid0
After the hot add, the status would then be:
Components:
/dev/sd1e: optimal
/dev/sd2e: failed
/dev/sd3e: optimal
Spares:
/dev/sd4e: spare
Reconstruction could then take place using -F as describe
above.
A second option is to rebuild directly onto /dev/sd2e. Once
the disk
containing /dev/sd2e has been replaced, one can simply use:
# raidctl -R /dev/sd2e raid0
to rebuild the /dev/sd2e component. As the rebuilding is in
progress,
the status will be:
Components:
/dev/sd1e: optimal
/dev/sd2e: reconstructing
/dev/sd3e: optimal
No spares.
and when completed, will be:
Components:
/dev/sd1e: optimal
/dev/sd2e: optimal
/dev/sd3e: optimal
No spares.
In circumstances where a particular component is completely
unavailable
after a reboot, a special component name will be used to indicate the
missing component. For example:
Components:
/dev/sd2e: optimal
component1: failed
No spares.
indicates that the second component of this RAID set was not
detected at
all by the auto-configuration code. The name `component1'
can be used
anywhere a normal component name would be used. For example, to add a
hot spare to the above set, and rebuild to that hot spare,
the following
could be done:
# raidctl -a /dev/sd3e raid0
# raidctl -F component1 raid0
at which point the data missing from `component1' would be
reconstructed
onto /dev/sd3e.
RAID on RAID [Toc] [Back]
RAID sets can be layered to create more complex and much
larger RAID
sets. A RAID 0 set, for example, could be constructed from
four RAID 5
sets. The following configuration file shows such a setup:
START array
# numRow numCol numSpare
1 4 0
START disks
/dev/raid1e
/dev/raid2e
/dev/raid3e
/dev/raid4e
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
128 1 1 0
START queue
fifo 100
A similar configuration file might be used for a RAID 0 set
constructed
from components on RAID 1 sets. In such a configuration,
the mirroring
provides a high degree of redundancy, while the striping
provides additional
speed benefits.
Auto-configuration and Root on RAID [Toc] [Back]
RAID sets can also be auto-configured at boot. To make a
set auto-configurable,
simply prepare the RAID set as above, and then do
a:
# raidctl -A yes raid0
to turn on auto-configuration for that set. To turn off auto-configuration,
use:
# raidctl -A no raid0
RAID sets which are auto-configurable will be configured before the root
file system is mounted. These RAID sets are thus available
for use as a
root file system, or for any other file system. A primary
advantage of
using the auto-configuration is that RAID components become
more independent
of the disks they reside on. For example, SCSI ID's
can change, but
auto-configured sets will always be configured correctly,
even if the SCSI
ID's of the component disks have become scrambled.
Having a system's root file system (/) on a RAID set is also
allowed,
with the `a' partition of such a RAID set being used for /.
To use
raid0a as the root file system, simply use:
# raidctl -A root raid0
To return raid0 to be just an auto-configuring set simply
use the -A yes
arguments.
Note that kernels can't be directly read from a RAID component. To support
the root file system on RAID sets, some mechanism must
be used to
get a kernel booting. For example, a small partition containing only the
secondary boot-blocks and an alternate kernel (or two) could
be used.
Once a kernel is booting however, and an auto-configured
RAID set is
found that is eligible to be root, then that RAID set will
be auto-configured
and its `a' partition (aka raid[0..n]a) will be used
as the root
file system. If two or more RAID sets claim to be root devices, then the
user will be prompted to select the root device. At this
time, RAID 0,
1, 4, and 5 sets are all supported as root devices.
A typical RAID 1 setup with root on RAID might be as follows:
1. wd0a - a small partition, which contains a complete,
bootable, basic
OpenBSD installation.
2. wd1a - also contains a complete, bootable, basic OpenBSD installation.
3. wd0e and wd1e - a RAID 1 set, raid0, used for the root
file system.
4. wd0f and wd1f - a RAID 1 set, raid1, which will be used
only for
swap space.
5. wd0g and wd1g - a RAID 1 set, raid2, used for /usr,
/home, or other
data, if desired.
6. wd0h and wd1h - a RAID 1 set, raid3, if desired.
RAID sets raid0, raid1, and raid2 are all marked as autoconfigurable.
raid0 is marked as being a root-able raid. When new kernels
are installed,
the kernel is not only copied to /, but also to
wd0a and wd1a.
The kernel on wd0a is required, since that is the kernel the
system boots
from. The kernel on wd1a is also required, since that will
be the kernel
used should wd0 fail. The important point here is to have
redundant
copies of the kernel available, in the event that one of the
drives fail.
There is no requirement that the root file system be on the
same disk as
the kernel. For example, obtaining the kernel from wd0a,
and using sd0e
and sd1e for raid0, and the root file system, is fine. It
is critical,
however, that there be multiple kernels available, in the
event of media
failure.
Multi-layered RAID devices (such as a RAID 0 set made up of
RAID 1 sets)
are not supported as root devices or auto-configurable devices at this
point. (Multi-layered RAID devices are supported in general, however, as
mentioned earlier.) Note that in order to enable component
auto-detection
and auto-configuration of RAID devices, the line:
option RAID_AUTOCONFIG
must be in the kernel configuration file. See raid(4) for
more details.
Unconfiguration [Toc] [Back]
The final operation performed by raidctl is to unconfigure a
raid(4) device.
This is accomplished via a simple:
# raidctl -u raid0
at which point the device is ready to be reconfigured.
Performance Tuning [Toc] [Back]
Selection of the various parameter values which result in
the best performance
can be quite tricky, and often requires a bit of
trial-and-error
to get those values most appropriate for a given system. A
whole range
of factors come into play, including:
1. Types of components (e.g. SCSI vs. IDE) and their bandwidth
2. Types of controller cards and their bandwidth
3. Distribution of components among controllers
4. IO bandwidth
5. File system access patterns
6. CPU speed
As with most performance tuning, benchmarking under reallife loads may
be the only way to measure expected performance. Understanding some of
the underlying technology is also useful in tuning. The
goal of this
section is to provide pointers to those parameters which may
make significant
differences in performance.
For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
Since data in a RAID 1 set is arranged in a linear fashion
on each component,
selecting an appropriate stripe size is somewhat less
critical than
it is for a RAID 5 set. However: a stripe size that is too
small will
cause large IO's to be broken up into a number of smaller
ones, hurting
performance. At the same time, a large stripe size may
cause problems
with concurrent accesses to stripes, which may also affect
performance.
Thus values in the range of 32 to 128 are often the most effective.
Tuning RAID 5 sets is trickier. In the best case, IO is
presented to the
RAID set one stripe at a time. Since the entire stripe is
available at
the beginning of the IO, the parity of that stripe can be
calculated before
the stripe is written, and then the stripe data and
parity can be
written in parallel. When the amount of data being written
is less than
a full stripe worth, the `small write' problem occurs.
Since a `small
write' means only a portion of the stripe on the components
is going to
change, the data (and parity) on the components must be updated slightly
differently. First, the `old parity' and `old data' must be
read from
the components. Then the new parity is constructed, using
the new data
to be written, and the old data and old parity. Finally,
the new data
and new parity are written. All this extra data shuffling
results in a
serious loss of performance, and is typically 2 to 4 times
slower than a
full stripe write (or read). To combat this problem in the
real world,
it may be useful to ensure that stripe sizes are small
enough that a
`large IO' from the system will use exactly one large stripe
write. As
is seen later, there are some file system dependencies which
may come into
play here as well.
Since the size of a `large IO' is often (currently) only 32K
or 64K, on a
5-drive RAID 5 set it may be desirable to select a SectPerSU
value of 16
blocks (8K) or 32 blocks (16K). Since there are 4 data sectors per
stripe, the maximum data per stripe is 64 blocks (32K) or
128 blocks
(64K). Again, empirical measurement will provide the best
indicators of
which values will yield better performance.
The parameters used for the file system are also critical to
good performance.
For newfs(8), for example, increasing the block size
to 32K or
64K may improve performance dramatically. Also, changing
the cylindersper-group
parameter from 16 to 32 or higher is often not only necessary
for larger file systems, but may also have positive performance implications.
Summary [Toc] [Back]
Despite the length of this man-page, configuring a RAID set
is a relatively
straight-forward process. All that needs to be done
is the following
steps:
1. Use disklabel(8) to create the components (of type
RAID).
2. Construct a RAID configuration file: e.g. `raid0.conf'
3. Configure the RAID set with:
# raidctl -C raid0.conf raid0
4. Initialize the component labels with:
# raidctl -I 123456 raid0
5. Initialize other important parts of the set with:
# raidctl -i raid0
6. Get the default label for the RAID set:
# disklabel raid0 > /tmp/label
7. Edit the label:
# vi /tmp/label
8. Put the new label on the RAID set:
# disklabel -R -r raid0 /tmp/label
9. Create the file system:
# newfs /dev/rraid0e
10. Mount the file system:
# mount /dev/raid0e /mnt
11. Use:
# raidctl -c raid0.conf raid0
to re-configure the RAID set the next time it is needed, or put
raid0.conf into /etc where it will automatically be
started by the
/etc/rc scripts.
Certain RAID levels (1, 4, 5, 6, and others) can protect
against some data
loss due to component failure. However the loss of two
components of
a RAID 4 or 5 system, or the loss of a single component of a
RAID 0 system
will result in the entire filesystem being lost. RAID
is NOT a substitute
for good backup practices.
Recomputation of parity MUST be performed whenever there is
a chance that
it may have been compromised. This includes after system
crashes, or before
a RAID device has been used for the first time. Failure to keep
parity correct will be catastrophic should a component ever
fail -- it is
better to use RAID 0 and get the additional space and speed,
than it is
to use parity, but not keep the parity correct. At least
with RAID 0
there is no perception of increased data security.
/dev/{,r}raid* raid device special files.
ccd(4), raid(4), rc(8)
RAIDframe is a framework for rapid prototyping of RAID
structures developed
by the folks at the Parallel Data Laboratory at
Carnegie Mellon University
(CMU). A more complete description of the internals
and functionality
of RAIDframe is found in the paper "RAIDframe: A
Rapid Prototyping
Tool for RAID Systems", by William V. Courtright II,
Garth Gibson,
Mark Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
Parallel Data Laboratory of Carnegie Mellon University.
The raidctl command first appeared as a program in CMU's
RAIDframe v1.1
distribution. This version of raidctl is a complete rewrite, and first
appeared in NetBSD 1.4 from where it was ported to OpenBSD
2.5.
Hot-spare removal is currently not available.
The RAIDframe Copyright is as follows:
Copyright (c) 1994-1996 Carnegie-Mellon University.
All rights reserved.
Permission to use, copy, modify and distribute this software
and
its documentation is hereby granted, provided that both the
copyright
notice and this permission notice appear in all copies of
the
software, derivative works or modified versions, and any
portions
thereof, and that both notices appear in supporting documentation.
CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS
IS"
CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY
KIND
FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS
SOFTWARE.
Carnegie Mellon requests users of this software to return to
Software Distribution Coordinator or [email protected]
School of Computer Science
Carnegie Mellon University
Pittsburgh PA 15213-3890
any improvements or extensions that they make and grant
Carnegie the
rights to redistribute these changes.
OpenBSD 3.6 July 10, 2001
[ Back ] |