diskalign(1) diskalign(1)
diskalign - XLV Aligned Disk Striping Utility
diskalign -n<name> -r<n>[k|m|g] -a<n>[k|m|g] '<template>' ..
This utility is designed to assist in creating striped XLV disk volumes
for data streaming applications. There are many factors that must be
taken into account when creating a striped XLV volume such as stripe
alignment, restrictions imposed by a filesystem on the volume and by the
operating system I/O functions such as readv(). This tool, in conjunction
with diskprep and diskperf will help extract maximum performance from a
striped XLV disk configuration for data streaming applications.
The output of diskalign can be piped directly into xlv_make to create an
XLV volume.
-n name
Specifies the name of the XLV device for the volume which will
appear in the /dev/xlv directory. This will also be the name used to
reference this volume from the XLV management tools such as xlv_mgr.
-r size[k|m|g]
Specifies the exact desired I/O request size for a particular
application in bytes, kilobytes, megabytes or gigabytes ( using
suffixes ), To achieve correct alignment, diskalign may round this
value up to the nearest appropriate boundary. The final adjusted
request size will be reflected in the output script for xlv_make.
For example, in a video streaming application this would be the
number of bytes per frame.
-a alignment[k|m|g]
Specifies the required alignment for the request size in bytes,
kilobytes, megabytes or gigabytes ( using suffixes ). This is a
critically important hardware and application dependent parameter.
See the section below for the details on choosing an appropriate
alignment size.
<template>
Specifies all the disk devices which compose this XLV striped volume
and the order in which they appear. This template format is
described below. Because the shell interprets the square brackets
used in the device template syntax, you must enclose each template
string inside single quotes. Any number of template arguments may be
supplied.
This tool is designed to make life easier when configuring disks into a
striped XLV volume for high performance data streaming applications. For
the purpose of this tool, a data streaming application is characterized
Page 1
diskalign(1) diskalign(1)
by a I/O requests of a fixed size ( or multiple of a fixed request size
). Obvious examples of such applications are uncompressed video streaming
to or from disk, database tiled texture paging and telemetry applications
with very high disk bandwidth requirements.
To ensure maximum performance there are many factors which must be taken
into account including :
o Hardware requirements and constraints.
o Kernel disk driver constraints.
o XLV striped driver operation.
o Page alignment constraints of readv()/writev().
o Alignment constraints imposed by using direct I/O.
o Application specific requirements.
To satisfy all these constraints, compromises have to be made which
ultimately culminate in using an I/O request size rounded up to an
appropriate alignment boundary. That is, you have to add padding to each
element of your data set ( eg. frame of video ) to guarantee alignment,
and hence optimal disk performance, for all elements of the data set.
Calculation of this padding factor is not a trivial problem. This tool
automates the process all the way to the point of generating the
appropriate script file for xlv_make to create the volume.
REQUEST SIZE ALIGNMENT
Choosing an appropriate alignment for the request size is critical to
achieving optimal disk performance. Badly chosen values can cause poor
performance or even violate hard constraints which will cause I/O errors
to be returned to the application. The rule for choosing a correct
alignment size is to take the maximum of all the following values :
Device dependent minimum request size [Toc] [Back]
Check whether the disk devices have any constraints on the minimum
allowed request size. For example, MaxStrat Gen5 disk arrays can be
configured with 64KB block size which is the minimum allowed
transfer size.
Filesystem block size
The use of direct I/O requires that the request size be a multiple
of the filesystem block size. This block size is chosen at the time
the filesystem is built with mkfs.
eg. # mkfs -b size=16384 /dev/xlv/video
System page size if using readv()/writev()
If the application makes use of the scatter/gather I/O mechanism
provided by the readv()/writev() system calls then the operating
system requires all requests to be system page size aligned. The
system page size is typically 16KB, but can be determined from the
shell using sysconf with the PAGESIZE argument.
Page 2
diskalign(1) diskalign(1)
DEVICE TEMPLATE FORMAT
The device template format is a syntax for specifying lists of devices in
a very compact and convenient way. A template is a string with embedded
numeric patterns, which allow a single string to represent many device
names. This is expecially useful specifying groups of disk devices
making up a striped volume. An example of a template representing
partition 7 on disks 1 to 4 on SCSI controller 9 is :
/dev/dsk/dks9d[1-4]s7
The template syntax allows numeric patterns to be inserted into a string
using the square bracket delimiters. The pattern may also contain control
sequences inside the square brackets which modify the way the pattern is
evaluated. For example, '[z3,1-3]' causes all numbers generated by this
pattern to be zero padded to three digits and thus represents the
sequence "001","002","003".
The supported pattern controls are as follows :
<n> This is the simplest of components and simply represents a single
number. Any number of these controls may appear in a pattern. For
example,
'test[1,57,13]' produces "test1","test57","test13"
<m>-<n>
Appends the range of numbers from m to n with increment ( set with
i<n> ) to the sequence. n may be less m implying that the range
will run backwards from m to n with specified increment. Note that n
may not actually appear into the output sequence for increments
greater than one as no numbers outside the specified range will be
produced. For example,
'x[1-3,99-97]' produces "x1","x2","x3","x99","x98","x97"
i<n> Sets the increment for all range controls in this pattern. Only one
of these controls may appear in a single pattern.
'[i3,1-8,666,99-97]' produces "1","4","7","666","99"
z<n> Sets the number of digits to which all numbers produced by the
pattern will be zero padded. For example,
'[i3,z3,1-5,666,99-97]' produces "001","004","666","099"
p<n> As multiple patterns may appear in a single template string, it is
sometimes important to be able to control the order of evaluation of
the patterns. This is especially important with when specifying
disk devices for striping as order of device specification to
xlv_make is critical to achieve optimal performance. This control
sets the priority of evaluation for the pattern. The default
priority for a pattern is one and evaluation order of patterns with
Page 3
diskalign(1) diskalign(1)
equal priority is from right to left in the string. Patterns with
the lowest priority values are evaluated first. Only one of these
controls may appear in a single pattern. For example,
'/dev/dsk/dks[1,2]d[3,4]s7' produces
"/dev/dsk/dks1d3s7"
"/dev/dsk/dks1d4s7"
"/dev/dsk/dks2d3s7"
"/dev/dsk/dks2d4s7"
whereas '/dev/dsk/dks[p0,1,2]d[3,4]s7' produces
"/dev/dsk/dks1d3s7"
"/dev/dsk/dks2d3s7"
"/dev/dsk/dks1d4s7"
"/dev/dsk/dks2d4s7"
EXAMPLE #1 : STRIPING FOR VIDEO STREAMING
In this example it will be shown how to configure an XLV striped volume
for storing uncompressed CCIR-601 NTSC fields. These fields will be
stored in native YCrCb ( sometimes referred to as YUV ) color space which
requires 2 bytes per pixel. To achieve real time playback at 60 fields
per second the volume will have to sustain a bandwidth of approximately
22MB/s. The volume will be configured using four UltraSCSI disks on the
internal controller 0 of an Origin2000 or Onyx2.
Parameters [Toc] [Back]
The parameters of interest for configuration of the XLV striped volume
are as follows :
Image width = 720 pixels ( CCIR-601 )
Image height = 243 pixels ( CCIR-601 NTSC field )
Bytes/pixel = 2 bytes ( YCrCb color space )
Controllers = 1 UltraSCSI
Disks/Ctlr = 4 UltraSCSI disks
Calculate Request Size [Toc] [Back]
The first parameter we have to calculate is the size of a single CCIR-601
NTSC field. The calculation is simple :
Request Size = Width * Height * Bytes_Per_Pixel
= 720 * 243 * 2
= 349920 bytes/field
Determine Alignment Size [Toc] [Back]
Now although we would like to use a 4KB filesystem block size, we would
also like to have the flexibility of using scatter/gather DMA to improve
performance. This requires alignment to 16KB page size boundaries for I/O
requests. So, we must choose an alignment factor of 16KB.
Page 4
diskalign(1) diskalign(1)
Constructing the XLV Volume
Here is the transcript for construction of this volume.
# diskalign -n video -r349920 -a16k '/dev/dsk/dks0d[2-5]s7' | tee
/tmp/xlv.script
# Number of devices = 4
# Filesystem block size = 16384 bytes
# Desired request size = 349920 bytes
# Aligned request size = 360448 bytes
# Alignment padding = 10528 bytes
# Padding I/O overhead = 3.01 %
#
vol video
data
plex
ve -force -stripe -stripe_unit 176 \
/dev/dsk/dks0d2s7 \
/dev/dsk/dks0d3s7 \
/dev/dsk/dks0d4s7 \
/dev/dsk/dks0d5s7
end
exit
# xlv_make < /tmp/xlv.script
video
video.data
video.data.0
video.data.0.0
Object specification completed
# mkfs /dev/xlv/video
# mkdir /video
# chmod 777 /video
# mount /dev/xlv/video /video
Interpretation of Results [Toc] [Back]
As can be seen in the script comments, padding was added to the size of
each field to achieve the required alignment. The application must read
or write 360448 bytes for each field, which includes the field data as
well as the padding to maintain alignment and hence optimal disk
performance for this configuration. The padding is only giving a 3.01%
overhead in size and bandwidth.
EXAMPLE #2 : STRIPING FOR HIGH RESOLUTION STREAMING
In this example it will be shown how to configure an XLV striped volume
for storing uncompressed high resolution images for real time preview
purposes. The resolution of the images is 2048 pixels by 1120 lines. The
images are stored using 8-bit RGB color space which requires 3 bytes per
pixel. The disk storage subsystem is composed of 20 fibre channel disks
connected to a dual channel XIO fibre channel adapter, with 10 disks
connected to each channel.
Page 5
diskalign(1) diskalign(1)
Parameters [Toc] [Back]
The parameters of interest for configuration of the XLV striped volume
are as follows :
Image width = 2048 pixels
Image height = 1120 lines
Bytes/pixel = 3 bytes ( 8-bit RGB color space )
Controllers = 2 XIO Fibre Channel
Disks/Ctlr = 10 Fibre Channel disks
Calculate Request Size [Toc] [Back]
The first parameter we have to calculate is the size of a single high
resolution frame. The calculation is simple :
Request Size = Width * Height * Bytes_Per_Pixel
= 2048 * 1120 * 3
= 6881280 bytes/frame
Determine Alignment Size [Toc] [Back]
Because of the large request size we choose a 16KB filesystem block size
which also makes scatter/gather DMA possible. Thus, we select a 16KB
alignment.
Constructing the XLV Volume
Here is the transcript for construction of this volume.
# diskalign -n film -r6881280 -a16k '/dev/dsk/dks[p0,10,11]d[0-9]s7' |
tee /tmp/xlv.script
# Number of devices = 20
# Filesystem block size = 16384 bytes
# Desired request size = 6881280 bytes
# Aligned request size = 6881280 bytes
# Alignment padding = 0 bytes
# Padding I/O overhead = 0.00 %
#
vol film
data
plex
ve -force -stripe -stripe_unit 672 \
/dev/dsk/dks10d0s7 \
/dev/dsk/dks11d0s7 \
/dev/dsk/dks10d1s7 \
/dev/dsk/dks11d1s7 \
/dev/dsk/dks10d2s7 \
/dev/dsk/dks11d2s7 \
/dev/dsk/dks10d3s7 \
/dev/dsk/dks11d3s7 \
/dev/dsk/dks10d4s7 \
/dev/dsk/dks11d4s7 \
/dev/dsk/dks10d5s7 \
Page 6
diskalign(1) diskalign(1)
/dev/dsk/dks11d5s7 \
/dev/dsk/dks10d6s7 \
/dev/dsk/dks11d6s7 \
/dev/dsk/dks10d7s7 \
/dev/dsk/dks11d7s7 \
/dev/dsk/dks10d8s7 \
/dev/dsk/dks11d8s7 \
/dev/dsk/dks10d9s7 \
/dev/dsk/dks11d9s7
end
exit
# xlv_make < /tmp/xlv.script
film
film.data
film.data.0
film.data.0.0
Object specification completed
# mkfs -b size=16384 /dev/xlv/film
# mkdir /film
# chmod 777 /film
# mount /dev/xlv/film /film
Interpretation of Results [Toc] [Back]
As can be seen in the script comments, we were fortunate in that the
frame size was already aligned correctly. Because of this, the requests
we already aligned and hence we have no padding overhead !!
Here are a few tips for getting the most from a disk configuration.
Homogenous Disks [Toc] [Back]
Ensure that all the disks in the volume are the same model. The
performance of the striped volume is directly dependent on the slowest
disk in the volume. One slow disk can affect the performance of the
entire volume.
Firmware Revisions [Toc] [Back]
Confirm that all the disks in the volume have the same firmware revision.
Different revisions may have different performance characteristics which
may adversely affect performance. The firmware revision of a disk can be
checked with fx. The diskprep utility can be used with SGI IBM Scorpion
UltraSCSI disks to automatically download the latest firmware revision.
Disk Parameter Settings
Ensure that all disks have the same parameter settings. For example, if
you enable write buffering on all the disks of a striped XLV volume
except one, the write performance will be constrained to the performance
of this single slow disk. The same applies for number of cache segments
and many other parameters. These can be checked with fx. The diskprep
utility can be used with SGI IBM Scorpion UltraSCSI disks to
automatically set all the parameters to SGI manufacturing defaults.
Page 7
diskalign(1) diskalign(1)
Enable Write Buffering
To achieve good write performance you can enable write buffering on all
the disks in the volume. Note that this does open a window of
vulnerability for disk corruption so you should carefully evaluate the
data integrity needs of your application before enabling write buffering.
This can be set manually using fx or automatically using the diskprep
utility if using SGI IBM Scorpion UltraSCSI disks.
Set Number Of Cache Segments [Toc] [Back]
The effect of the parameter is disk vendor specific, but is applicable to
the SGI IBM Scorpion UltraSCSI disks. For data streaming applications
setting the number of cache segments to 1 can give a significant
performance boost due to much better onboard disk cache utilization. This
can be set manually using fx or automatically using the diskprep utility
if using SGI IBM Scorpion UltraSCSI disks.
Iterate Controllers First [Toc] [Back]
Because of the way the striped XLV driver works, it is much more
efficient to iterate across controllers first and disks second when
specifying devices to be striped. The pattern priority control p<n> can
be used to achieve this. To illustrate, the two templates below specify
the same devices for a volume, but in different orders. The volume
generated by the first pattern achieves better performance than the the
second.
'/dev/dsk/dks[p0,1-3]d[4-6]s7' which represents
/dev/dsk/dks1d4s7
/dev/dsk/dks2d4s7
/dev/dsk/dks3d4s7
/dev/dsk/dks1d5s7
/dev/dsk/dks2d5s7
/dev/dsk/dks3d5s7
/dev/dsk/dks1d6s7
/dev/dsk/dks2d6s7
/dev/dsk/dks3d6s7
performs better than '/dev/dsk/dks[1-3]d[4-6]s7'
/dev/dsk/dks1d4s7
/dev/dsk/dks1d5s7
/dev/dsk/dks1d6s7
/dev/dsk/dks2d4s7
/dev/dsk/dks2d5s7
/dev/dsk/dks2d6s7
/dev/dsk/dks3d4s7
/dev/dsk/dks3d5s7
/dev/dsk/dks3d6s7
Verify Volume Performance [Toc] [Back]
The diskperf utility can be used to measure the performance of striped
volume once it has been configured. This will allow you to determine in
Page 8
diskalign(1) diskalign(1)
advance if a configuration is adequate for a particular application.
diskprep(1M), diskperf(1M), read(2), write(2), readv(2), writev(2),
xlv_make(1M), mkfs(1M), sysconf(1)
None
Will McGovern ( [email protected] )
Advanced Entertainment Systems Division
Silicon Graphics Inc.
PPPPaaaaggggeeee 9999 [ Back ]
|