regcmp, regex - Compile and execute regular expression
#include <libgen.h>
char *regcmp(
const char *string1,
... /*,
(char *)0 */ ); char *regex(
const char *re,
const char *subject,
... );
Standard C Library (libc)
Interfaces documented on this reference page conform to
industry standards as follows:
regcmp(), regex(): XPG4-UNIX
Refer to the standards(5) reference page for more information
about industry standards and associated tags.
Points to the string that is to be matched or converted.
Points to a compiled regular expression string. Points to
the string that is to be matched against re.
The regcmp() function compiles a regular expression consisting
of the concatenated arguments and returns a
pointer to the compiled form. The end of arguments is
indicated by a null pointer. The malloc() function is used
to create space for the compiled form. It is the responsibility
of the process to free unneeded space so allocated.
A null pointer returned from regcmp() indicates an invalid
argument.
The regex() function executes a compiled pattern against
the subject string. Additional arguments of type char must
be passed to receive matched subexpressions back. A global
character pointer, __loc1, points to the first matched
character in the subject string.
The regcmp() and regex() functions support the simple regular
expressions which are defined in the grep(1) reference
page, but the syntax and semantics are slightly different.
The following are the valid symbols and their
associated meanings: The left and right bracket, asterisk,
period, and circumflex symbols retain their meanings as
defined in the grep(1) reference page. A dollar sign
matches the end of the string; \n matches a new line.
Used within brackets, the hyphen signifies an ASCII character
range. For example [a-z] is equivalent to
[abcd...xyz]. The - (hyphen) can represent itself only if
used as the first or last character. For example, the
character class expression []-] matches the characters ]
(right bracket) and - (hyphen). A regular expression followed
by a + (plus sign) means one or more times. For
example, [0-9]+ is equivalent to [0-9][0-9]*. Integer
values enclosed in {} braces indicate the number of times
the preceding regular expression can be applied. The value
m is the minimum number and u is a number, less than 256,
which is the maximum. The syntax {m} indicates the exact
number of times the regular expression can be applied. The
syntax {m,} is analogous to {m,infinity}. The + (plus
sign) and * (asterisk) operations are equivalent to {1,}
and {0,}, respectively. The value of the enclosed regular
expression is returned. The value is stored in the
(n+1)th argument following the subject argument. A maximum
of ten enclosed regular expressions are allowed. The
regex() function makes its assignments unconditionally.
Parentheses are used for grouping. An operator, such as *,
+, or {}, can work on a single character or a regular
expression enclosed in parentheses. For example,
(a*(cb+)*)$0.
Since all of the symbols defined above are special characters,
they must be escaped to be used as themselves.
The regcmp() and regex() interfaces are scheduled to be
withdrawn from a future version of the X/Open CAE Specification.
These interfaces are obsolete; they are guaranteed to
function properly only in the C/POSIX locale and so should
be avoided. Use the POSIX regcomp() interface instead of
regcmp() and regex().
Upon successful completion, the regcmp() function returns
a pointer to the compiled regular expression. Otherwise, a
null pointer is returned and errno may be set to indicate
the error.
Upon successful completion, the regex() function returns a
pointer to the next unmatched character in the subject
string. Otherwise, a null pointer is returned.
Commands: grep(1)
Functions: malloc(3), regcomp(3)
Standards: standards(5)
regcmp(3)
[ Back ] |