*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> regexec (3)              
Title
Content
Arch
Section
 

regcomp(3)

Contents


NAME    [Toc]    [Back]

       regcomp,  regerror,  regexec,  regfree - Compare string to
       regular expression

SYNOPSIS    [Toc]    [Back]

       #include <sys/types.h> #include <regex.h>

       int regcomp(
               regex_t *preg,
               const char *pattern,
               int cflags ); size_t regerror(
               int errcode,
               const regex_t *preg,
               char *errbuf,
               size_t errbuf_size ); int regexec(
               const regex_t *preg,
               const char *string,
               size_t nmatch,
               regmatch_t *pmatch,
               int eflags ); void regfree(
               regex_t *preg );

LIBRARY    [Toc]    [Back]

       Standard C Library (libc)

STANDARDS    [Toc]    [Back]

       Interfaces documented on this reference  page  conform  to
       industry standards as follows:

       regcomp(),   regexec(),  regerror(),  regfree():  POSIX.2,
       XPG4, XPG4-UNIX

       Refer to the standards(5) reference page for more information
 about industry standards and associated tags.

PARAMETERS    [Toc]    [Back]

       Specifies  the options for regcomp(). The cflags parameter
       is the bitwise inclusive OR of zero or more of the following
 options, which are defined in the /usr/include/regex.h
       file.  Uses extended regular expressions.  Ignores case in
       match.  Reports only success or failure in regexec(); does
       not report subexpressions.  Treats newline  as  a  special
       character  marking  the  end and beginning of lines.  Contains
 the basic or extended regular expression to be  compiled
  by regcomp().  The structure that contains the compiled
 basic or extended  regular  expression.   Identifies
       the  error  code.   Points  to the buffer where regerror()
       stores the message text.  Specifies the size of the errbuf
       buffer.   Contains  the  data to be matched.  Contains the
       number of subexpressions to match.  Contains the array  of
       offsets  into  the  string parameter that match the corresponding
 subexpression in the preg  parameter.   Specifies
       the  options  controlling the customizable behavior of the
       regexec function. The eflags parameter modifies the interpretation
  of  the  contents  of the string parameter. The
       value for this parameter is formed  by  bitwise  inclusive
       ORing  zero  or  more  of the following options, which are
       defined in the /usr/include/regex.h file.  The first character
  of the string pointed to by the string parameter is
       not the beginning of the line. Therefore,  the  circumflex
       character  ^ (circumflex), when taken as a special character,
 does not match the beginning of the string parameter.
       The  last character of the string pointed to by the string
       parameter is not the end of the  line.  Therefore,  the  $
       (dollar sign), when taken as a special character, does not
       match the end of the string parameter.

DESCRIPTION    [Toc]    [Back]

       The regcomp(), regerror(), regexec(), and regfree()  functions
  perform  regular expression matching. The regcomp()
       function compiles a regular expression and  the  regexec()
       function  compares  the  compiled  regular expression to a
       string. The regerror() function  returns  text  associated
       with  an  error  condition  encountered  by  regcomp()  or
       regexec(). The regfree() function frees the internal storage
 allocated for the compiled regular expression.

       The regcomp() function compiles the basic or extended regular
 expression specified by  the  pattern  parameter  and
       places the output in the preg structure. The default regular
 expression type for the pattern parameter is  a  basic
       regular  expression.  An  application can specify extended
       regular expressions with the REG_EXTENDED option.

       If the REG_NOSUB option is not set in coptions,  the  regcomp()
  function sets the number of parenthetic subexpressions
 (delimited by \( and \) in basic regular expressions
       or  by  ()  in extended regular expressions) to the number
       found in pattern.

       The regexec() function compares the null-terminated string
       in  the  string  parameter  against  the compiled basic or
       extended regular expression in the preg parameter.   If  a
       match  is found, the regexec() function returns a value of
       0 (zero). The regexec() function  returns  REG_NOMATCH  if
       there  is no match. Any other nonzero value returned indicates
 an error.

       If the value of the nmatch parameter is 0 (zero) or if the
       REG_NOSUB  option  was  set  on  the call to the regcomp()
       function, the regexec() function ignores the pmatch parameter.
  Otherwise,  the pmatch parameter points to an array
       of at least the number of elements specified by the nmatch
       parameter. The regexec() function fills in the elements of
       the array pointed to by the pmatch parameter with  offsets
       of the substrings of the string parameter. The elements of
       the pmatch array correspond to the parenthetic  subexpressions
 of the original pattern parameter that was specified
       to the regcomp() function. The  pmatch[i].rm_so  structure
       is  the byte offset of the beginning of the substring, and
       the pmatch[i].rm_eo structure is one greater than the byte
       offset of the end of the substring. Subexpression i begins
       at the ith matched open parenthesis, counting from 1.  The
       0  (zero)  element  of the array corresponds to the entire
       pattern. Unused elements of the pmatch  parameter,  up  to
       the  value  pmatch[nmatch-1],  are  filled with -1. If the
       number of subexpressions exceeds the number  specified  by
       the  nmatch parameter (the pattern parameter itself counts
       as a subexpression), only the first nmatch-1 are recorded.

       When  matching a basic or extended regular expression, any
       given parenthetic subexpression of the  pattern  parameter
       can  participate  in  the  match of several different substrings
 of the string parameter; however, it may not match
       any  substring  even  though  the  pattern  as a whole did
       match. The following rules are  used  to  determine  which
       substrings to report in the pmatch parameter when matching
       regular expressions:  If  a  subexpression  in  a  regular
       expression  participated  in  the match several times, the
       offset of the last matching substring is reported  in  the
       pmatch  parameter.  If a subexpression did not participate
       in a match, the byte offset in the pmatch parameter  is  a
       value  of -1.  If a subexpression is contained in a subexpression,
 the data in the pmatch parameter refers  to  the
       last  such subexpression.  If a subexpression is contained
       in a subexpression and the  byte  offsets  in  the  pmatch
       parameter  have  a value of -1, the pointers in the pmatch
       parameter also have a value of  -1.   If  a  subexpression
       matched  a  zero-length  string, the offsets in the pmatch
       parameter refer to  the  byte  immediately  following  the
       matching string.

       If the REG_NOSUB option was set in the cflags parameter in
       the call to the regcomp() function and the nmatch  parameter
  is  not  equal to 0 (zero) in the call to the regexec
       function, the content of the pmatch array is  unspecified.

       If the REG_NEWLINE option was not set in the cflags parameter
 when the regcomp() function  was  called,  a  newline
       character in the pattern or string parameter is treated as
       an ordinary character. If the REG_NEWLINE option  was  set
       when  the regcomp() function was called, the newline character
 is treated as an ordinary character, except as  follows:
  A  newline character in the string parameter is not
       matched by a (dot) outside of a bracket expression  or  by
       any  form  of a nonmatching list.  A ^ (circumflex) in the
       pattern parameter, when used to specify expression anchoring,
  matches  the  zero-length string immediately after a
       newline character in the string parameter,  regardless  of
       the  setting  of the REG_NOTBOL option.  A $ (dollar sign)
       in the pattern parameter, when used to specify  expression
       anchoring,  matches  the  zero-length  string  immediately
       before  a  newline  character  in  the  string  parameter,
       regardless of the setting of the REG_NOTEOL option.

       The  regerror()  function returns the text associated with
       the specified error code. If the  regcomp()  or  regexec()
       function  fails,  it returns a nonzero error code. If this
       return value is assigned to  the  errcode  parameter,  the
       regerror()  function  returns  the  text of the associated
       message.

       If the errbuf_size parameter is not 0,  regerror()  places
       the  generated  string  into  the  buffer size errbuf_size
       bytes pointed to by errbuf.  If the string (including  the
       terminating  null)  cannot  fit  in the buffer, regerror()
       truncates the string and null-terminates the result.

       If errbuf_size is 0, regerror() ignores the errbuf parameter
  and returns the size of the buffer needed to hold the
       generated string.

       The regfree() function frees any memory allocated  by  the
       regcomp()  function associated with the preg parameter. An
       expression defined by the  preg  parameter  is  no  longer
       treated as a compiled basic or extended regular expression
       after it is given to the regfree() function.







RETURN VALUES    [Toc]    [Back]

       Upon successful completion, the regcomp() function returns
       a value of 0 (zero). Otherwise, regcomp() returns an integer
 value indicating an error as described below, and  the
       contents  of  the preg parameter is undefined. If the regcomp()
 function detects an illegal basic or extended regular
  expression,  it  returns  REG_BADPAT or an error code
       that more precisely describes the error.

       If the regexec() function  finds  a  match,  the  function
       returns  a  value  of  0  (zero).  Otherwise,  it  returns
       REG_NOMATCH to indicate no match or REG_ENOSYS to indicate
       that the function is not supported.

       Upon   successful   completion,  the  regerror()  function
       returns the number of bytes needed to hold the entire generated
 string. This value may be greater than the value of
       the errbuf_size parameter. If regerror fails, it returns 0
       (zero) to indicate that the function is not implemented.

       The regfree() function returns no value.

       The  following  constants are defined as error return values:
 The contents within the pair \{ and \}  are  invalid:
       not  a number, number too large, more than two numbers, or
       first number larger than second.  The pattern contains  an
       invalid  regular  expression.   The ?, *, or + symbols are
       not preceded by a valid regular expression.  The use of  a
       pair  of  \{ and \} or {} is unbalanced.  The use of [] is
       unbalanced.  An invalid collating element was  referenced.
       An  invalid character class type was referenced.  The pattern
 contains a trailing \ (backslash).  The  function  is
       unsupported.   The  use  of  a  pair of \( and \) or () is
       unbalanced or exceeds the allowable range.  The  range  is
       set  in  the  _REG_SUBEXP_MAX  parameter of regex.h and is
       usually 49.   An  endpoint  in  the  range  expression  is
       invalid.   Insufficient  memory  space  is available.  The
       number in \digit is invalid or in error.  The pattern contains
  too many parenthetic subexpressions.  The regexec()
       function did not find a match.

ERRORS    [Toc]    [Back]

       These functions do not set errno to indicate an error.

EXAMPLES    [Toc]    [Back]

       The following  example  demonstrates  how  the  REG_NOTBOL
       option can be used with the regexec() function to find all
       substrings in a line that match a pattern  supplied  by  a
       user. The main() function in the example accepts two input
       strings from the user. The match() function in the example
       uses regcomp() and regexec() to search for matches.

       #include   <sys/types.h>   #include   <regex.h>   #include
       <locale.h> #include <stdio.h> #include <string.h> #include
       <nl_types.h> #include "reg_example.h" #define SLENGTH 128

       main() {

           char    patt[SLENGTH], strng[SLENGTH];
           char    *eol;
           nl_catd catd;

           (void)setlocale(LC_ALL, );
           catd = catopen("reg_example.cat", NL_CAT_LOCALE);

           printf(catgets(catd,SET1,INPUT,
                  "Enter a regular expression:"));
           fgets(patt, SLENGTH, stdin);

           if ((eol = strchr(patt, '\n')) != NULL)
               *eol = '\0';  /* Replace newline with null */

           else

               return;  /* Line entered too long */
           printf(catgets(catd,SET1,COMPARE,
                  "Enter string to compare\nString: "));
           fgets(strng, SLENGTH, stdin);
           if ((eol = strchr(strng, '\n')) != NULL)
               *eol = '\0';  /* Replace newline with null */

           else

               return;  /* Line entered too long */

           match(patt, strng);

       }



       int    match(char *pattern, char *string)

       {

           char    message[SLENGTH];
           char    *start_search;
           int    error, msize, count;
           regex_t preg;
           regmatch_t pmatch;


           error = regcomp(&preg, pattern,
                   REG_ICASE | REG_EXTENDED);
           if (error) {
               msize = regerror(error, &preg, message, SLENGTH);
               printf("%s\n", message);
               if (msize > SLENGTH)
                   printf(catgets(catd,SET1,LOST,"Additional text
       lost\n"));
               return;

           }

           error = regexec(&preg, string, 1, &pmatch, 0);
           if (error == REG_NOMATCH) {
               printf(catgets(catd,SET1,NO_MATCH,
                      "No matches in string\n"));
               return;
           } else if (error != 0) {
               msize = regerror(error, &preg, message, SLENGTH);
               printf("%s\n", message);
               if (msize > SLENGTH)
                   printf(catgets(catd,SET1,LOST,
                          "Additional text lost\n"));
               return;

           };

           count = 1;
           start_search = string + pmatch.rm_eo;
           while (error == 0) {
               error =
                   regexec(&preg, start_search, 1, &pmatch,
                   REG_NOTBOL);
               start_search = start_search + pmatch.rm_eo;
               count++;

           };

           count--;
           printf(catgets(catd,SET1,MATCH,
                  "There are %i matches\n"), count);
           regfree(&preg);
           catclose(catd);

       }

       The following example finds out  which  subexpressions  in
       the  regular  expression  have matches in the string. This
       example uses the same  main()  program  as  the  preceding
       example.  This  example  does not  specify REG_EXTENDED in
       the call to regcomp() and,  consequently, uses basic regular
 expressions, not extended regular  expressions.

       #define  MAX_MATCH  10  int     match(char  *pattern, char
       *string) {

           char    message[SLENGTH];
           char    *start_search;
           int    error, msize, count, matches_tocheck;
           regex_t preg;
           regmatch_t pmatch[MAX_MATCH];


           error = regcomp(&preg, pattern, REG_ICASE);
           if (error) {

               msize = regerror(error, &preg, message, SLENGTH);
               printf("regcomp: %s\n", message);
               if (msize > SLENGTH)
                   printf(catgets(catd,SET1,LOST,
                          "Additional text lost\n"));
               return;

           }


           if (preg.re_nsub > MAX_MATCH) {
               printf(catgets(catd,SET1,SUBEXPR,
                   "There  are  %1$i   subexpressions,   checking
       %2$i\n"),
                    preg.re_nsub, MAX_MATCH);
               matches_tocheck = MAX_MATCH;

           } else {
               printf(catgets(catd,SET1,SUB_EXPR_NUM,
                   "There  are  %i  subexpressions in the regular
       expression\n"),
                    preg.re_nsub);
               matches_tocheck = preg.re_nsub;

           }
           error = regexec(&preg, string, MAX_MATCH,  &pmatch[0],
       0);
           if (error == REG_NOMATCH) {
               printf(catgets(catd,SET1,NO_MATCH_ENT,
                   "String did not contain match for entire regular
 expression\n"));
               return;

           } else if (error != 0) {
               msize = regerror(error, &preg, message, SLENGTH);
               printf("regexe: %s\n", message);
               if (msize > SLENGTH)
                   printf(catgets(catd,SET1,LOST,
                          "Additional text lost\n"));
               return;

           } else
               printf(catgets(catd,SET1,MATCH_ENT,
                   "String contained match for the entire regular
       expression\n"));
           for (count = 0; count <= matches_tocheck; count++) {
               if (pmatch[count].rm_so != -1) {
                   printf(catgets(catd,SET1,SUB_EXPR_MATCH
                     "Subexpression       %i      matched      in
       string\n"),count);
                   printf(catgets(catd,SET1,MATCH_WHERE,
                          "Match starts at %1$i. Byte after match
       is %2$i\n"),
                           pmatch[count].rm_so,
       pmatch[count].rm_eo);

               } else
                   printf(catgets(catd,SET1,NO_MATCH_SUB,
                          "Subexpression  %i  had  NO  match\n"),
       count);

           }

           regfree(&preg);
           catclose(catd);

       }

SEE ALSO    [Toc]    [Back]

      
      
       Commands: grep(1)

       Standards: standards(5)



                                                       regcomp(3)
[ Back ]
 Similar pages
Name OS Title
regexp IRIX Match a regular expression against a string
regerror FreeBSD regular-expression library
regexec FreeBSD regular-expression library
regsub OpenBSD regular expression routines
regcomp FreeBSD regular-expression library
re_comp FreeBSD regular expression handler
regex NetBSD regular-expression library
regex FreeBSD regular-expression library
regfree OpenBSD regular expression routines
regfree FreeBSD regular-expression library
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service