NAME regcmp, regex - compile and execute regular expression SYNTAX char *regcmp (string1 [, string2, ...], (char *)0) char *string1, *string2, ...; char *regex (re, subject[, ret0, ...]) char *re, *subject, *ret0, ...; extern char *__loc1; DESCRIPTION Regcmp compiles a regular expression and returns a pointer to the compiled form. Malloc(3C) is used to create space for the vector. It is the user's responsibility to free unneeded space so allocated. A NULL return from regcmp indicates an incorrect argument. Regcmp(1) has been written to generally preclude the need for this routine at execution time. Regex executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. Regex returns NULL on failure or a pointer to the next unmatched character on success. A global character pointer __loc1 points to where the match began. Regcmp and regex were mostly borrowed from the editor, ed(1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated meanings. []*.^ These symbols retain their current meaning. $ Matches the end of the string; \n matches a new- line. - Within brackets the minus means through. For example, [a-z] is equivalent to [abcd...xyz]. The - can appear as itself only if used as the first or last character. For example, the character class expression []-] matches the characters ] and -. + A regular expression followed by + means one or more times. For example, [0-9]+ is equivalent to [0-9][0-9]*. {m} {m,} {m,u} Integer values enclosed in {} indicate the number of times the preceding regular expression is to be applied. The value m is the minimum number and u is a number, less than 256, which is the maximum. If only m is present (e.g., {m}), it indicates the exact number of times the regular expression is to be applied. The value {m,} is analogous to {m,infinity}. The plus (+) and star (*) operations are equivalent to {1,} and {0,} respectively. ( ... )$n The value of the enclosed regular expression is to be returned. The value will be stored in the (n+1)th argument following the subject argument. At most ten enclosed regular expressions are allowed. Regex makes its assignments unconditionally. ( ... ) Parentheses are used for grouping. An operator, e.g., *, +, {}, can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0. By necessity, all the above defined symbols are special. They must, therefore, be escaped to be used as themselves. EXAMPLES Example 1: char *cursor, *newcursor, *ptr; ... newcursor = regex((ptr = regcmp("^\n", 0)), cursor); free(ptr); This example will match a leading new-line in the subject string pointed at by cursor. Example 2: char ret0[9]; char *newcursor, *name; ... name = regcmp("([A-Za-z][A-za-z0-9_]{0,7})$0", '(char *)0'); newcursor = regex(name, "123Testing321", ret0); This example will match through the string ``Testing3'' and will return the address of the character after the last matched character (cursor+11). The string ``Testing3'' will be copied to the character array ret0. Example 3: #include "file.i" char *string, *newcursor; ... newcursor = regex(name, string); This example applies a precompiled regular expression in file.i [see regcmp(1)] against string. This routine is kept in /lib//libPW.a, where module is either small or large. SEE ALSO malloc(3C). ed(1), regcmp(1) in the UNIX System V User Reference Manual. BUGS The user program may run out of memory if regcmp is called iteratively without freeing the vectors no longer required. The following user-supplied replacement for malloc(3C) reuses the same vector, saving time and space: /* user's program */ ... char * malloc(n) unsigned n; { static char rebuf[512]; return (n <= sizeof rebuf) ? rebuf : NULL; }