BLAS(3F) BLAS(3F)
BLAS - Basic Linear Algebra Subprograms
BLAS is a library of routines that perform basic operations involving
matrices and vectors. They were designed as a way of achieving efficiency
in the solution of linear algebra problems. The BLAS, as they are now
commonly called, have been very successful and have been used in a wide
range of software, including LINPACK, LAPACK and many of the algorithms
published by the ACM Transactions on Mathematical Software. They are an
aid to clarity, portability, modularity and maintenance of software, and
have become the de facto standard for elementary vector and matrix
operations.
The BLAS promote modularity by identifying frequently occurring
operations of linear algebra and by specifying a standard interface to
these operations. Efficiency is achieved through optimization within the
BLAS without altering the higher-level code that has referenced them.
There are three levels of BLAS. The original set of BLAS, commonly
referred as the Level 1 BLAS, perform low-level operations such as dotproduct
and the adding of a multiple of one vector to another. Typically
these operations involve O(N) floating point operations and O(N) data
items moved (loaded or stored), where N is the length of the vectors. The
Level 1 BLAS permit efficient implementation on scalar machines, but the
ratio of floating-point operations to data movement is too low to achieve
effective use of most vector or parallel hardware.
The Level 2 BLAS perform Matrix-Vector operations that occur frequently
in the implementation of mant of the most common linear algebra
algorithms. They involve O(N^2) floating point operations. Algorithms
that use Level 2 BLAS can be very efficient on vector computers, but are
not well suited to computers with a hierarchy of memory (such as cache
memory).
The Level 3 BLAS are targeted at matrix-matrix operations. These
operations generally involve O(N^3) floating point operations, while only
creating O(N^2) data movement. These operations permit efficient reuse of
data that resides in cache and create what is often called the surfaceto-volumne
effect for the ratio of computations to data movement. In
addition, matrices can be partitioned into blocks, and operations on
distinct blocks can be performed in parallel, and within the operations
on each block, scalar or vector operations may be performed in parallel.
BLAS2 and BLAS3 modules have been optimized and parallelized to take
advantage of SGI's RISC parallel architecture. The best performances are
achieved for BLAS3 routines (e.g. DGEMM), where "outer-loop" unrolling +
"blocking" techniques were applied to take advantage of the memory cache.
The performance of BLAS2 routines (e.g. DGEMV) is sensitive to the size
of the problem, for large sizes the high rate of cache miss slows down
the algorithms.
LAPACK algorithms use preferably BLAS3 modules and are the most
Page 1
BLAS(3F) BLAS(3F)
efficient. LINPACK uses only BLAS1 modules and therefore is less
efficient than LAPACK.
To link with "libblas", it is advised to use "f77" to load all the
Fortran Libraries required, otherwise include -lftn in your link line.
For R8000 and R10000 based machines, you should use the mips4 version.
This is accomplished by using -mips4 when linking:
f77 -mips4 -o foobar.out foo.o bar.o -lblas
To use the parallelized version, use
f77 -mips4 -mp -o foobar.out foo.o bar.o -lblas_mp
BLAS Level 1:
.....function...... ....prefix,suffix..... rootname
dot product s- d- c-u c-c z-u z-c -doty
= a*x + y s- d- c- z- -axpy
setup Givens rotation s- d- -rotg
apply Givens rotation s- d- cs- zd- -rot
copy x into y s- d- c- z- -copy
swap x and y s- d- c- z- -swap
Euclidean norm s- d- sc- dz- -nrm2
sum of absolute values s- d- sc- dz- -asum
x = a*x s- d- cs- c- zd- z- -scal
index of max abs value is- id- ic- iz- -amax
BLAS Level 2:
MV Matrix vector multiply
R Rank one update to a matrix
R2 Rank two update to a matrix
SV Solving certain triangular matrix problems.
single precision Level 2 BLAS | Double precision Level 2 BLAS
-----------------------------------------------------------------------
MV R R2 SV | MV R R2 SV
SGE x x | DGE x x
SGB x | DGB x
SSP x x x | DSP x x x
SSY x x x | DSY x x x
SSB x | DSB x
STR x x | DTR x x
STB x x | DTB x x
STP x x | DTP x x
complex Level 2 BLAS | Double precision complex Level 2 BLAS
-----------------------------------------------------------------------
MV R RC RU R2 SV| MV R RC RU R2 SV
CGE x x x | ZGE x x x
CGB x | ZGB x
CHE x x x | ZHE x x x
CHP x x x | ZHP x x x
CHB x | ZHB x
Page 2
BLAS(3F) BLAS(3F)
CTR x x | ZTR x x
CTB x x | ZTB x x
CTP x x | ZTP x x
BLAS Level 3:
MM Matrix matrix multiply
RK Rank-k update to a matrix
R2K Rank-2k update to a matrix
SM Solving triangular matrix with many right-hand-sides.
single precision Level 3 BLAS | Double precision Level 3 BLAS
-----------------------------------------------------------------------
MM RK R2K SM | MM RK R2K SM
SGE x | DGE x
SSY x x x | DSY x x x
STR x x | DTR x x
complex Level 3 BLAS | Double precision complex Level 3 BLAS
-----------------------------------------------------------------------
MM RK R2K SM | MM RK R2K SM
CGE x | ZGE x
CSY x x x | ZSY x x x
CHE x x x | ZHE x x x
CTR x x | ZTR x x
There is a C interface for the BLAS library. The implementation is based
on the proposed specification for BLAS routines in C [1].
The argument lists follow closely the equivalent Fortran ones. The main
changes being that enumeration types are used instead of character types
for option specification, and two dimensional arrays are stored in one
dimensional C arrays in an analogous fashion as a Fortran array (column
major). Therefore, a matrix A would be stored as:
double (*a)[lda*n];
/* */
/* a is a pointer to an array of size tda*n */
/* */
where element A(i+1,j) of matrix A is stored immediately after the
element A(i,j), while A(i,j+1) is lda elements apart from A(i,j). The
element A(i,j) of the matrix can be accessed directly by reference to a[
(j-1)*lda + (i-1) ].
The names of the C versions of the BLAS are the same as the Fortran
versions since the compiler puts the Fortran names in upper case and adds
an underscore after the name.
The argument lists use the following data types:
Integer: an integer data type of 32 bits.
Page 3
BLAS(3F) BLAS(3F)
float: the regular single precision floating-point type.
double: the regular double precision floating-point type.
Complex: a single precision complex type.
Zomplex: a double precision complex type.
plus the enumeration types given by
typedef enum { NoTranspose, Transpose, ConjugateTranspose }
MatrixTranspose;
typedef enum { UpperTriangle, LowerTriangle }
MatrixTriangle;
typedef enum { UnitTriangular, NotUnitTriangular }
MatrixUnitTriangular;
typedef enum { LeftSide, RightSide }
OperationSide;
The complex data types are stored in cartesian form, i.e., as real and
imaginary parts. For example
typedef struct { float real;
float imag;
} Complex;
typedef struct { double real;
double imag;
} Zomplex;
The operations performed by the C BLAS are identical to those performed
by the corresponding Fortran BLAS, as specified in [2], [3] and [4].
To use the C BLAS, link with "libblas". It is advised to use "f77" to
load all the Fortran Libraries required:
f77 -o foobar.out foo.o bar.o -lblas
/usr/lib/libblas.a
/usr/lib/libblas_mp.a
/usr/include/cblas.h
The original Fortran source code comes from netlib.
S.P. Datardina, J.J. Du Croz, S.J. Hammarling and M.W. Pont, "A Proposed
Specification of BLAS Routines in C", NAG Technical Report TR6/90.
C Lawson, R. Hanson, D. Kincaid, and F. Krough, "Basic Linear Algebra
Subprograms for Fortran usage ", ACM Trans. on Math. Soft. 5(1979)
308-325
Page 4
BLAS(3F) BLAS(3F)
J.Dongarra, J.DuCroz, S.Hammarling, and R.Hanson, "An extended set of
Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14,
1(1988) 1-32
J.Dongarra, J.DuCroz, I.Duff,and S.Hammarling, "An set of level 3 Basic
Algebra Subprograms", ACM Trans on Math Soft( Dec 1989)
PPPPaaaaggggeeee 5555 [ Back ]
|