Nergahak ManpageViewer

RE(3) Library Functions Manual RE(3) NAME re_bm, re_cw, re_re - string and pattern matching SYNOPSIS #include <re.h> re_bm *re_bmcomp(b, e, map) char *b, *e; unsigned char map[256]; int re_bmexec(pat, rdfn, matchfn) re_bm *pat; int (*rdfn)(), (*matchfn)(); void re_bmfree(pat); re_bm *pat; re_cw *re_cwinit(map) unsigned char map[256]; void re_cwadd(pat, b, e) re_cw *pat; char *b, *e; void re_cwcomp(pat) re_cw *pat; int re_cwexec(pat, rdfn, matchfn) re_cw *pat; int (*rdfn)(), (*matchfn)(); void re_cwfree(pat); re_cw *pat; re_re *re_recomp(b, e, map) char *b, *e; unsigned char map[256]; re_reexec(pat, b, e, match) re_re *pat; char *b, *e, *match[10][2]; void re_refree(pat); re_re *pat; void re_error(str); char *str; DESCRIPTION These routines search for patterns in strings. The re_re routines search for general regular expressions (defined below) using a lazily evaluated deterministic finite automaton. The more specialized and faster re_cw routines search for multiple literal strings using the Commentz-Walter algorithm. The still more specialized and efficient re_bm routines search for a single string using the Boyer-Moore algo- rithm. The routines handle strings designated by pointers to the first character of the string and to the character following the string. To use the re_bm routines, first build a recognizer by calling re_bm- comp, which takes the search string and a character map; all characters are compared after mapping. Typically, map is initialized by a loop similar to for(i = 0; i < 256; i++) map[i] = i; and its value is no longer required after the call to re_bmcomp. The recognizer can be run (multiple times) by calling re_bmexec, which stops and returns the first non-positive return from either rdfn or matchfn. The recognizer calls the supplied function rdfn to obtain input and matchfn to report text matching the search string. Rdfn should be declared as int rdfn(pb, pe) char **pb, **pe; where *pb and *pe delimit an as yet unprocessed text fragment (none if to be saved across the call to rdfn. On return, *pb and *pe point to the new text, including the saved fragment. Rdfn returns 0 for EOF, negative for error, and positive otherwise. The first call to rdfn from each invocation of re_bmexec has *pb==0. Matchfn should be declared as int matchfn(pb, pe) char **pb, **pe; where *pb and *pe delimit the matched text. Matchfn sets *pb, *pe, and returns a value in the same way as rdfn. To use the re_cw routines, first build the recognizer by calling re_cwinit, then re_cwadd for each string, and finally re_cwcomp. The recognizer is run by re_cwexec analogously to re_bmexec. A full regular expression recognizer is compiled by re_recomp and exe- cuted by re_reexec, which returns 1 if there was a match and 0 if there wasn't. The strings that match subexpressions are returned in array match using the above convention. refers to the whole matched expres- sion. If match is zero, then no match delimiters are set. The routine re_error prints its argument on standard error and exits. You may supply your own version for specialized error handling. If re_error returns rather than exits, the compiling routines (e.g. re_bmcomp) will return 0. The recognizers that these routines construct occupy storage obtained from malloc(3). The storage can be deallocated by re_refree. Regular Expressions The syntax for a regular expression e0 is e3: literal | charclass | '.' | '^' | '$' | '\'n | '(' e0 ')' e2: e3 | e2 REP REP: '*' | '+' | '?' | '\{' RANGE '\}' RANGE: int | int ',' | int ',' int e1: e2 | e1 e2 e0: e1 | e0 ALT e1 ALT: '|' | newline A literal is any non-metacharacter or a metacharacter (one of .*+?[]()|\^$) preceded by A charclass is a nonempty string s bracketed [s] (or [^s]); it matches any character in (or not in) s. In s, the metacharacters other than have no special meaning, and may only appear as the first letter. A substring a-b, with a and b in ascending ASCII order, stands for the inclusive range of ASCII characters between a and b. A followed by a digit n matches a copy of the string that the parenthe- sized subexpression beginning with the nth counting from 1, matched. A matches any character. A matches the beginning of the input string; matches the end. The REP operators match zero or more (*), one or more (+), zero or one (?), exactly m (\{m\}), m or more (\{m,\}), and any number between m and n inclusive (\{m,n\}), instances respectively of the preceding reg- ular expression e2. A concatenated regular expression, e1 e2, matches a match to e1 fol- lowed by a match to e2. An alternative regular expression, e0 ALT e1, matches either a match to e0 or a match to e1. A match to any part of a regular expression extends as far as possible without preventing a match to the remainder of the regular expression. SEE ALSO regexp(3), gre(1) DIAGNOSTICS Routines that return pointers return 0 on error. BUGS Between re(3) and regexp(3) there are too many routines. RE(3)

Navigation Options

Actions: [Home] [Back] [New Search]

Browse: [Browse UNIX v10] [Section 3]

Print/Export: [Print] [Raw Text]

* UNIX MANUAL PAGE BROWSER *

Navigation

Directory Browser

Manual Page Search

Manual Page Result

Navigation Options

*** UNIX MANUAL PAGE BROWSER ***

Navigation

Directory Browser

Manual Page Search

Manual Page Result

Navigation Options

* UNIX MANUAL PAGE BROWSER *