regcomp, regerror, regexec, regfree - Compare string to
regular expression
#include <sys/types.h> #include <regex.h>
int regcomp(
regex_t *preg,
const char *pattern,
int cflags ); size_t regerror(
int errcode,
const regex_t *preg,
char *errbuf,
size_t errbuf_size ); int regexec(
const regex_t *preg,
const char *string,
size_t nmatch,
regmatch_t *pmatch,
int eflags ); void regfree(
regex_t *preg );
Standard C Library (libc)
Interfaces documented on this reference page conform to
industry standards as follows:
regcomp(), regexec(), regerror(), regfree(): POSIX.2,
XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information
about industry standards and associated tags.
Specifies the options for regcomp(). The cflags parameter
is the bitwise inclusive OR of zero or more of the following
options, which are defined in the /usr/include/regex.h
file. Uses extended regular expressions. Ignores case in
match. Reports only success or failure in regexec(); does
not report subexpressions. Treats newline as a special
character marking the end and beginning of lines. Contains
the basic or extended regular expression to be compiled
by regcomp(). The structure that contains the compiled
basic or extended regular expression. Identifies
the error code. Points to the buffer where regerror()
stores the message text. Specifies the size of the errbuf
buffer. Contains the data to be matched. Contains the
number of subexpressions to match. Contains the array of
offsets into the string parameter that match the corresponding
subexpression in the preg parameter. Specifies
the options controlling the customizable behavior of the
regexec function. The eflags parameter modifies the interpretation
of the contents of the string parameter. The
value for this parameter is formed by bitwise inclusive
ORing zero or more of the following options, which are
defined in the /usr/include/regex.h file. The first character
of the string pointed to by the string parameter is
not the beginning of the line. Therefore, the circumflex
character ^ (circumflex), when taken as a special character,
does not match the beginning of the string parameter.
The last character of the string pointed to by the string
parameter is not the end of the line. Therefore, the $
(dollar sign), when taken as a special character, does not
match the end of the string parameter.
The regcomp(), regerror(), regexec(), and regfree() functions
perform regular expression matching. The regcomp()
function compiles a regular expression and the regexec()
function compares the compiled regular expression to a
string. The regerror() function returns text associated
with an error condition encountered by regcomp() or
regexec(). The regfree() function frees the internal storage
allocated for the compiled regular expression.
The regcomp() function compiles the basic or extended regular
expression specified by the pattern parameter and
places the output in the preg structure. The default regular
expression type for the pattern parameter is a basic
regular expression. An application can specify extended
regular expressions with the REG_EXTENDED option.
If the REG_NOSUB option is not set in coptions, the regcomp()
function sets the number of parenthetic subexpressions
(delimited by \( and \) in basic regular expressions
or by () in extended regular expressions) to the number
found in pattern.
The regexec() function compares the null-terminated string
in the string parameter against the compiled basic or
extended regular expression in the preg parameter. If a
match is found, the regexec() function returns a value of
0 (zero). The regexec() function returns REG_NOMATCH if
there is no match. Any other nonzero value returned indicates
an error.
If the value of the nmatch parameter is 0 (zero) or if the
REG_NOSUB option was set on the call to the regcomp()
function, the regexec() function ignores the pmatch parameter.
Otherwise, the pmatch parameter points to an array
of at least the number of elements specified by the nmatch
parameter. The regexec() function fills in the elements of
the array pointed to by the pmatch parameter with offsets
of the substrings of the string parameter. The elements of
the pmatch array correspond to the parenthetic subexpressions
of the original pattern parameter that was specified
to the regcomp() function. The pmatch[i].rm_so structure
is the byte offset of the beginning of the substring, and
the pmatch[i].rm_eo structure is one greater than the byte
offset of the end of the substring. Subexpression i begins
at the ith matched open parenthesis, counting from 1. The
0 (zero) element of the array corresponds to the entire
pattern. Unused elements of the pmatch parameter, up to
the value pmatch[nmatch-1], are filled with -1. If the
number of subexpressions exceeds the number specified by
the nmatch parameter (the pattern parameter itself counts
as a subexpression), only the first nmatch-1 are recorded.
When matching a basic or extended regular expression, any
given parenthetic subexpression of the pattern parameter
can participate in the match of several different substrings
of the string parameter; however, it may not match
any substring even though the pattern as a whole did
match. The following rules are used to determine which
substrings to report in the pmatch parameter when matching
regular expressions: If a subexpression in a regular
expression participated in the match several times, the
offset of the last matching substring is reported in the
pmatch parameter. If a subexpression did not participate
in a match, the byte offset in the pmatch parameter is a
value of -1. If a subexpression is contained in a subexpression,
the data in the pmatch parameter refers to the
last such subexpression. If a subexpression is contained
in a subexpression and the byte offsets in the pmatch
parameter have a value of -1, the pointers in the pmatch
parameter also have a value of -1. If a subexpression
matched a zero-length string, the offsets in the pmatch
parameter refer to the byte immediately following the
matching string.
If the REG_NOSUB option was set in the cflags parameter in
the call to the regcomp() function and the nmatch parameter
is not equal to 0 (zero) in the call to the regexec
function, the content of the pmatch array is unspecified.
If the REG_NEWLINE option was not set in the cflags parameter
when the regcomp() function was called, a newline
character in the pattern or string parameter is treated as
an ordinary character. If the REG_NEWLINE option was set
when the regcomp() function was called, the newline character
is treated as an ordinary character, except as follows:
A newline character in the string parameter is not
matched by a (dot) outside of a bracket expression or by
any form of a nonmatching list. A ^ (circumflex) in the
pattern parameter, when used to specify expression anchoring,
matches the zero-length string immediately after a
newline character in the string parameter, regardless of
the setting of the REG_NOTBOL option. A $ (dollar sign)
in the pattern parameter, when used to specify expression
anchoring, matches the zero-length string immediately
before a newline character in the string parameter,
regardless of the setting of the REG_NOTEOL option.
The regerror() function returns the text associated with
the specified error code. If the regcomp() or regexec()
function fails, it returns a nonzero error code. If this
return value is assigned to the errcode parameter, the
regerror() function returns the text of the associated
message.
If the errbuf_size parameter is not 0, regerror() places
the generated string into the buffer size errbuf_size
bytes pointed to by errbuf. If the string (including the
terminating null) cannot fit in the buffer, regerror()
truncates the string and null-terminates the result.
If errbuf_size is 0, regerror() ignores the errbuf parameter
and returns the size of the buffer needed to hold the
generated string.
The regfree() function frees any memory allocated by the
regcomp() function associated with the preg parameter. An
expression defined by the preg parameter is no longer
treated as a compiled basic or extended regular expression
after it is given to the regfree() function.
Upon successful completion, the regcomp() function returns
a value of 0 (zero). Otherwise, regcomp() returns an integer
value indicating an error as described below, and the
contents of the preg parameter is undefined. If the regcomp()
function detects an illegal basic or extended regular
expression, it returns REG_BADPAT or an error code
that more precisely describes the error.
If the regexec() function finds a match, the function
returns a value of 0 (zero). Otherwise, it returns
REG_NOMATCH to indicate no match or REG_ENOSYS to indicate
that the function is not supported.
Upon successful completion, the regerror() function
returns the number of bytes needed to hold the entire generated
string. This value may be greater than the value of
the errbuf_size parameter. If regerror fails, it returns 0
(zero) to indicate that the function is not implemented.
The regfree() function returns no value.
The following constants are defined as error return values:
The contents within the pair \{ and \} are invalid:
not a number, number too large, more than two numbers, or
first number larger than second. The pattern contains an
invalid regular expression. The ?, *, or + symbols are
not preceded by a valid regular expression. The use of a
pair of \{ and \} or {} is unbalanced. The use of [] is
unbalanced. An invalid collating element was referenced.
An invalid character class type was referenced. The pattern
contains a trailing \ (backslash). The function is
unsupported. The use of a pair of \( and \) or () is
unbalanced or exceeds the allowable range. The range is
set in the _REG_SUBEXP_MAX parameter of regex.h and is
usually 49. An endpoint in the range expression is
invalid. Insufficient memory space is available. The
number in \digit is invalid or in error. The pattern contains
too many parenthetic subexpressions. The regexec()
function did not find a match.
These functions do not set errno to indicate an error.
The following example demonstrates how the REG_NOTBOL
option can be used with the regexec() function to find all
substrings in a line that match a pattern supplied by a
user. The main() function in the example accepts two input
strings from the user. The match() function in the example
uses regcomp() and regexec() to search for matches.
#include <sys/types.h> #include <regex.h> #include
<locale.h> #include <stdio.h> #include <string.h> #include
<nl_types.h> #include "reg_example.h" #define SLENGTH 128
main() {
char patt[SLENGTH], strng[SLENGTH];
char *eol;
nl_catd catd;
(void)setlocale(LC_ALL, );
catd = catopen("reg_example.cat", NL_CAT_LOCALE);
printf(catgets(catd,SET1,INPUT,
"Enter a regular expression:"));
fgets(patt, SLENGTH, stdin);
if ((eol = strchr(patt, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
printf(catgets(catd,SET1,COMPARE,
"Enter string to compare\nString: "));
fgets(strng, SLENGTH, stdin);
if ((eol = strchr(strng, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
match(patt, strng);
}
int match(char *pattern, char *string)
{
char message[SLENGTH];
char *start_search;
int error, msize, count;
regex_t preg;
regmatch_t pmatch;
error = regcomp(&preg, pattern,
REG_ICASE | REG_EXTENDED);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,"Additional text
lost\n"));
return;
}
error = regexec(&preg, string, 1, &pmatch, 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH,
"No matches in string\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
};
count = 1;
start_search = string + pmatch.rm_eo;
while (error == 0) {
error =
regexec(&preg, start_search, 1, &pmatch,
REG_NOTBOL);
start_search = start_search + pmatch.rm_eo;
count++;
};
count--;
printf(catgets(catd,SET1,MATCH,
"There are %i matches\n"), count);
regfree(&preg);
catclose(catd);
}
The following example finds out which subexpressions in
the regular expression have matches in the string. This
example uses the same main() program as the preceding
example. This example does not specify REG_EXTENDED in
the call to regcomp() and, consequently, uses basic regular
expressions, not extended regular expressions.
#define MAX_MATCH 10 int match(char *pattern, char
*string) {
char message[SLENGTH];
char *start_search;
int error, msize, count, matches_tocheck;
regex_t preg;
regmatch_t pmatch[MAX_MATCH];
error = regcomp(&preg, pattern, REG_ICASE);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regcomp: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
}
if (preg.re_nsub > MAX_MATCH) {
printf(catgets(catd,SET1,SUBEXPR,
"There are %1$i subexpressions, checking
%2$i\n"),
preg.re_nsub, MAX_MATCH);
matches_tocheck = MAX_MATCH;
} else {
printf(catgets(catd,SET1,SUB_EXPR_NUM,
"There are %i subexpressions in the regular
expression\n"),
preg.re_nsub);
matches_tocheck = preg.re_nsub;
}
error = regexec(&preg, string, MAX_MATCH, &pmatch[0],
0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH_ENT,
"String did not contain match for entire regular
expression\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regexe: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
} else
printf(catgets(catd,SET1,MATCH_ENT,
"String contained match for the entire regular
expression\n"));
for (count = 0; count <= matches_tocheck; count++) {
if (pmatch[count].rm_so != -1) {
printf(catgets(catd,SET1,SUB_EXPR_MATCH
"Subexpression %i matched in
string\n"),count);
printf(catgets(catd,SET1,MATCH_WHERE,
"Match starts at %1$i. Byte after match
is %2$i\n"),
pmatch[count].rm_so,
pmatch[count].rm_eo);
} else
printf(catgets(catd,SET1,NO_MATCH_SUB,
"Subexpression %i had NO match\n"),
count);
}
regfree(&preg);
catclose(catd);
}
Commands: grep(1)
Standards: standards(5)
regcomp(3)
[ Back ] |