Manual Page Result
0
Command: regfree | Section: 3 | Source: Digital UNIX | File: regfree.3.gz
regcomp(3) Library Functions Manual regcomp(3)
NAME
regcomp, regerror, regexec, regfree - Compares string to regular ex-
pression
LIBRARY
Standard C Library (libc.so, libc. a)
SYNOPSIS
#include <sys/types.h> #include <regex.h>
int regcomp( regex_t *preg, const char *pattern, int
cflags);
size_t regerror( int errcode, const regex_t *preg, char
*errbuf, size_t errbuf_size);
int regexec( const regex_t *preg, const char *string,
size_t nmatch, regmatch_t *pmatch, int eflags);
void regfree( regex_t *preg);
STANDARDS
Interfaces documented on this reference page conform to industry stan-
dards as follows:
regcomp(), regexec(), regerror(), regfree(): POSIX.2, XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information about in-
dustry standards and associated tags.
PARAMETERS
Specifies the flags for regcomp(). The cflags parameter is the bitwise
inclusive OR of zero or more of the following flags, which are defined
in the /usr/include/regex.h file. Uses extended regular expressions.
Ignores case in match. Reports only success or failure in regexec();
does not report subexpressions. Treats newline as a special character
marking the end and beginning of lines. Contains the basic or extended
regular expression to be compiled by regcomp(). The structure that
contains the compiled basic or extended regular expression. Identifies
the error code. Points to the buffer where regerror() stores the mes-
sage text. Specifies the size of the errbuf buffer. Contains the data
to be matched. Contains the number of subexpressions to match. Con-
tains the array of offsets into the string parameter that match the
corresponding subexpression in the preg parameter. Specifies the flags
controlling the customizable behavior of the regexec function. The
eflags parameter modifies the interpretation of the contents of the
string parameter. The value for this parameter is formed by bitwise
inclusive ORing zero or more of the following flags, which are defined
in the /usr/include/regex.h file. The first character of the string
pointed to by the string parameter is not the beginning of the line.
Therefore, the circumflex character ^ (circumflex), when taken as a
special character, does not match the beginning of the string parame-
ter. The last character of the string pointed to by the string parame-
ter is not the end of the line. Therefore, the $ (dollar sign), when
taken as a special character, does not match the end of the string pa-
rameter.
DESCRIPTION
The regcomp(), regerror(), regexec(), and regfree() functions perform
regular expression matching. The regcomp() function compiles a regular
expression and the regexec() function compares the compiled regular ex-
pression to a string. The regerror() function returns text associated
with an error condition encountered by regcomp() or regexec(). The
regfree() function frees the internal storage allocated for the com-
piled regular expression.
The regcomp() function compiles the basic or extended regular expres-
sion specified by the pattern parameter and places the output in the
preg structure. The default regular expression type for the pattern
parameter is a basic regular expression. An application can specify ex-
tended regular expressions with the REG_EXTENDED flag.
If the REG_NOSUB flag is not set in cflags, the regcomp() function
sets the number of parenthetic subexpressions (delimited by \( and \)
in basic regular expressions, or () in extended regular expressions) to
the number found in pattern.
The regexec() function compares the null-terminated string in the
string parameter against the compiled basic or extended regular expres-
sion in the preg parameter. If a match is found, the regexec() func-
tion returns a value of 0 (zero). The regexec() function returns
REG_NOMATCH if there is no match. Any other nonzero value returned in-
dicates an error.
If the value of the nmatch parameter is 0 (zero), or if the REG_NOSUB
flag was set on the call to the regcomp() function, the regexec() func-
tion ignores the pmatch parameter. Otherwise, the pmatch parameter
points to an array of at least the number of elements specified by the
nmatch parameter. The regexec() function fills in the elements of the
array pointed to by the pmatch parameter with offsets of the substrings
of the string parameter. The elements of the pmatch array correspond
to the parenthetic subexpressions of the original pattern parameter
that was specified to the regcomp() function. The pmatch[i].rm_so
structure is the byte offset of the beginning of the substring, and the
pmatch[i].rm_eo structure is one greater than the byte offset of the
end of the substring. Subexpression i begins at the ith matched open
parenthesis, counting from 1. The 0 (zero) element of the array corre-
sponds to the entire pattern. Unused elements of the pmatch parameter,
up to the value pmatch[nmatch-1], are filled with -1. If there are more
than the number of subexpressions specified by the nmatch parameter
(the pattern parameter itself counts as a subexpression), only the
first nmatch-1 are recorded.
When matching a basic or extended regular expression, any given paren-
thetic subexpression of the pattern parameter can participate in the
match of several different substrings of the string parameter; however,
it may not match any substring even though the pattern as a whole did
match. The following rules are used to determine which substrings to
report in the pmatch parameter when matching regular expressions: If a
subexpression in a regular expression participated in the match several
times, the offset of the last matching substring is reported in the
pmatch parameter. If a subexpression did not participate in a match,
then the byte offset in the pmatch parameter is a value of -1. If a
subexpression is contained in a subexpression, the data in the pmatch
parameter refers to the last such subexpression. If a subexpression is
contained in a subexpression and the byte offsets in the pmatch parame-
ter have a value of -1, the pointers in the pmatch parameter also have
a value of -1. If a subexpression matched a zero-length string, the
offsets in the pmatch parameter refer to the byte immediately following
the matching string.
If the REG_NOSUB flag was set in the cflags parameter in the call to
the regcomp() function, and the nmatch parameter is not equal to 0
(zero) in the call to the regexec function, the content of the pmatch
array is unspecified.
If the REG_NEWLINE flag was not set in the cflags parameter when the
regcomp() function was called, then a newline character in the pattern
or string parameter is treated as an ordinary character. If the
REG_NEWLINE flag was set when the regcomp() function was called, the
newline character is treated as an ordinary character, except as fol-
lows: A newline character in the string parameter is not matched by a .
(dot) outside of a bracket expression or by any form of a nonmatching
list. A ^ (circumflex) in the pattern parameter, when used to specify
expression anchoring, matches the zero-length string immediately after
a newline character in the string parameter, regardless of the setting
of the REG_NOTBOL flag. A $ (dollar sign) in the pattern parameter,
when used to specify expression anchoring, matches the zero-length
string immediately before a newline character in the string parameter,
regardless of the setting of the REG_NOTEOL flag.
The regerror() function returns the text associated with the specified
error code. If the regcomp() or regexec() function fails, it returns a
nonzero error code. If this return value is assigned to the errcode pa-
rameter, the regerror() function returns the text of the associated
message.
If the errbuf_size parameter is not 0, regerror() places the generated
string into the buffer size errbuf_size bytes pointed to by errbuf. If
the string (including the terminating null) cannot fit in the buffer,
regerror() truncates the string and null-terminates the result.
If errbuf_size is 0, regerror() ignores the errbuf parameter, and re-
turns the size of the buffer needed to hold the generated string.
The regfree() function frees any memory allocated by the regcomp()
function associated with the preg parameter. An expression defined by
the preg parameter is no longer treated as a compiled basic or extended
regular expression after it is given to the regfree() function.
EXAMPLES
The following example demonstrates how the REG_NOTBOL flag can be used
with the regexec() function to find all substrings in a line that match
a pattern supplied by a user. The main() function in the example ac-
cepts two input strings from the user. The match() function in the ex-
ample uses regcomp() and regexec() to search for matches.
#include <sys/types.h> #include <regex.h> #include <locale.h> #include
<stdio.h> #include <string.h> #include <nl_types.h> #include "reg_exam-
ple.h" #define SLENGTH 128
main() {
char patt[SLENGTH], strng[SLENGTH];
char *eol;
nl_catd catd;
(void)setlocale(LC_ALL, "");
catd = catopen("reg_example.cat", NL_CAT_LOCALE);
printf(catgets(catd,SET1,INPUT,
"Enter a regular expression:"));
fgets(patt, SLENGTH, stdin);
if ((eol = strchr(patt, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
printf(catgets(catd,SET1,COMPARE,
"Enter string to compare\nString: "));
fgets(strng, SLENGTH, stdin);
if ((eol = strchr(strng, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
match(patt, strng); }
int match(char *pattern, char *string) {
char message[SLENGTH];
char *start_search;
int error, msize, count;
regex_t preg;
regmatch_t pmatch;
error = regcomp(&preg, pattern,
REG_ICASE | REG_EXTENDED);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,"Additional text lost\n"));
return;
}
error = regexec(&preg, string, 1, &pmatch, 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH,
"No matches in string\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
};
count = 1;
start_search = string + pmatch.rm_eo;
while (error == 0) {
error =
regexec(&preg, start_search, 1, &pmatch,
REG_NOTBOL);
start_search = start_search + pmatch.rm_eo;
count++;
};
count--;
printf(catgets(catd,SET1,MATCH,
"There are %i matches\n"), count);
regfree(&preg);
catclose(catd); }
The following example finds out which subexpressions in the regular ex-
pression have matches in the string. This example uses the same main()
program as the preceding example. This example does not specify REG_EX-
TENDED in the call to regcomp() and, consequently, uses basic regular
expressions, not extended regular expressions.
#define MAX_MATCH 10 int match(char *pattern, char *string) {
char message[SLENGTH];
char *start_search;
int error, msize, count, matches_tocheck;
regex_t preg;
regmatch_t pmatch[MAX_MATCH];
error = regcomp(&preg, pattern, REG_ICASE);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regcomp: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
}
if (preg.re_nsub > MAX_MATCH) {
printf(catgets(catd,SET1,SUBEXPR,
"There are %1$i subexpressions, checking %2$i\n"),
preg.re_nsub, MAX_MATCH);
matches_tocheck = MAX_MATCH;
} else {
printf(catgets(catd,SET1,SUB_EXPR_NUM,
"There are %i subexpressions in the regular expression\n"),
preg.re_nsub);
matches_tocheck = preg.re_nsub;
}
error = regexec(&preg, string, MAX_MATCH, &pmatch[0], 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH_ENT,
"String did not contain match for entire regular expres-
sion\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regexe: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
} else
printf(catgets(catd,SET1,MATCH_ENT,
"String contained match for the entire regular expres-
sion\n"));
for (count = 0; count <= matches_tocheck; count++) {
if (pmatch[count].rm_so != -1) {
printf(catgets(catd,SET1,SUB_EXPR_MATCH
"Subexpression %i matched in string\n"),count);
printf(catgets(catd,SET1,MATCH_WHERE,
"Match starts at %1$i. Byte after match is %2$i\n"),
pmatch[count].rm_so, pmatch[count].rm_eo);
} else
printf(catgets(catd,SET1,NO_MATCH_SUB,
"Subexpression %i had NO match\n"), count);
}
regfree(&preg);
catclose(catd); }
RETURN VALUES
Upon successful completion, the regcomp() function returns a value of 0
(zero). Otherwise, regcomp() returns an integer value indicating an
error as described below, and the contents of the preg parameter is un-
defined. If the regcomp() function detects an illegal basic or ex-
tended regular expression, it may return REG_BADPAT, or it may return
an error code that more precisely describes the error.
If the regexec() function finds a match, the function returns a value
of 0 (zero). Otherwise, it returns REG_NOMATCH to indicate no match,
or REG_NOSYS to indicate that the function is not supported.
Upon successful completion, the regerror() function returns the number
of bytes needed to hold the entire generated string. This value may be
greater than the value of the errbuf_size parameter. If regerror
fails, it returns 0 (zero) to indicate that the function is not imple-
mented.
The regfree() function returns no value.
The following constants are defined as error return values: The con-
tents within the pair \{ and \} are invalid: Not a number, number too
large, more than two numbers, or first number larger than second.
There is an invalid regular expression. The ?, *, or + symbols are not
preceded by a valid regular expression. The use of a pair of \{ and \}
or {} is unbalanced. The use of [] is unbalanced. There is an invalid
collating element referenced. There is an invalid character class type
referenced. There is a trailing \ (backslash) in the pattern. The
function is unsupported. The use of a pair of \( and \) or () is un-
balanced. There was an invalid endpoint in the range expression.
There is insufficient memory space. The number in \digit is invalid or
in error. The regexec() function did not find a match.
ERRORS
These functions do not set errno to indicate an error.
RELATED INFORMATION
Commands: grep(1)
Standards: standards(5) delim off
regcomp(3)