Manual Page Result
0
Command: re | Section: 3 | Source: UNIX v10 | File: re.3
RE(3) Library Functions Manual RE(3)
NAME
re_bm, re_cw, re_re - string and pattern matching
SYNOPSIS
#include <re.h>
re_bm *re_bmcomp(b, e, map)
char *b, *e;
unsigned char map[256];
int re_bmexec(pat, rdfn, matchfn)
re_bm *pat;
int (*rdfn)(), (*matchfn)();
void re_bmfree(pat);
re_bm *pat;
re_cw *re_cwinit(map)
unsigned char map[256];
void re_cwadd(pat, b, e)
re_cw *pat;
char *b, *e;
void re_cwcomp(pat)
re_cw *pat;
int re_cwexec(pat, rdfn, matchfn)
re_cw *pat;
int (*rdfn)(), (*matchfn)();
void re_cwfree(pat);
re_cw *pat;
re_re *re_recomp(b, e, map)
char *b, *e;
unsigned char map[256];
re_reexec(pat, b, e, match)
re_re *pat;
char *b, *e, *match[10][2];
void re_refree(pat);
re_re *pat;
void re_error(str);
char *str;
DESCRIPTION
These routines search for patterns in strings. The re_re routines
search for general regular expressions (defined below) using a lazily
evaluated deterministic finite automaton. The more specialized and
faster re_cw routines search for multiple literal strings using the
Commentz-Walter algorithm. The still more specialized and efficient
re_bm routines search for a single string using the Boyer-Moore algo-
rithm. The routines handle strings designated by pointers to the first
character of the string and to the character following the string.
To use the re_bm routines, first build a recognizer by calling re_bm-
comp, which takes the search string and a character map; all characters
are compared after mapping. Typically, map is initialized by a loop
similar to
for(i = 0; i < 256; i++) map[i] = i;
and its value is no longer required after the call to
re_bmcomp.
The recognizer can be run (multiple times) by calling
re_bmexec,
which stops and returns the first non-positive return from either
rdfn
or
matchfn.
The recognizer calls the supplied function
rdfn
to obtain input and
matchfn
to report text matching the search string.
Rdfn
should be declared as
int rdfn(pb, pe)
char **pb, **pe;
where *pb and *pe delimit an as yet unprocessed text fragment (none if
to be saved across the call to rdfn. On return, *pb and *pe point to
the new text, including the saved fragment. Rdfn returns 0 for EOF,
negative for error, and positive otherwise. The first call to rdfn
from each invocation of re_bmexec has *pb==0.
Matchfn should be declared as
int matchfn(pb, pe)
char **pb, **pe;
where *pb and *pe delimit the matched text. Matchfn sets *pb, *pe, and
returns a value in the same way as rdfn.
To use the re_cw routines, first build the recognizer by calling
re_cwinit, then re_cwadd for each string, and finally re_cwcomp. The
recognizer is run by re_cwexec analogously to re_bmexec.
A full regular expression recognizer is compiled by re_recomp and exe-
cuted by re_reexec, which returns 1 if there was a match and 0 if there
wasn't. The strings that match subexpressions are returned in array
match using the above convention. refers to the whole matched expres-
sion. If match is zero, then no match delimiters are set.
The routine re_error prints its argument on standard error and exits.
You may supply your own version for specialized error handling. If
re_error returns rather than exits, the compiling routines (e.g.
re_bmcomp) will return 0.
The recognizers that these routines construct occupy storage obtained
from malloc(3). The storage can be deallocated by re_refree.
Regular Expressions
The syntax for a regular expression e0 is
e3: literal | charclass | '.' | '^' | '$' | '\'n | '(' e0 ')'
e2: e3
| e2 REP
REP: '*' | '+' | '?' | '\{' RANGE '\}'
RANGE: int | int ',' | int ',' int
e1: e2
| e1 e2
e0: e1
| e0 ALT e1
ALT: '|' | newline
A literal is any non-metacharacter or a metacharacter (one of
.*+?[]()|\^$) preceded by
A charclass is a nonempty string s bracketed [s] (or [^s]); it matches
any character in (or not in) s. In s, the metacharacters other than
have no special meaning, and may only appear as the first letter. A
substring a-b, with a and b in ascending ASCII order, stands for the
inclusive range of ASCII characters between a and b.
A followed by a digit n matches a copy of the string that the parenthe-
sized subexpression beginning with the nth counting from 1, matched.
A matches any character.
A matches the beginning of the input string; matches the end.
The REP operators match zero or more (*), one or more (+), zero or one
(?), exactly m (\{m\}), m or more (\{m,\}), and any number between m
and n inclusive (\{m,n\}), instances respectively of the preceding reg-
ular expression e2.
A concatenated regular expression, e1 e2, matches a match to e1 fol-
lowed by a match to e2.
An alternative regular expression, e0 ALT e1, matches either a match to
e0 or a match to e1.
A match to any part of a regular expression extends as far as possible
without preventing a match to the remainder of the regular expression.
SEE ALSO
regexp(3), gre(1)
DIAGNOSTICS
Routines that return pointers return 0 on error.
BUGS
Between re(3) and regexp(3) there are too many routines.
RE(3)