[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Unix is cursed with a number of incompatible syntaxes for regular expression patterns, used by different programs, and with various features. The shell globbing patterns are used most frequently. These are simple and terse, but they are not fully general regular expressions. Q's solution extends the conventional globbing syntax, as in the Korn Shell.
These are the most important special characters:
The builtin function match
tries to match a string
against a quoted pattern:
"abcd" match a*d # Succeeds |
"abcd" match a*d A*D ==> "AbcD" |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The glob
function takes a single string, interprets
it as a globbing pattern, and returns a sorted vector of matching
file names. The result is the empty vector if there are
no matches.
One unique feature of glob
is that it knows how
search through an unbounded number of sub-directories.
To find every Makefile
in any sub-directory of dld-*
do:
glob "dld-*/*(*/)Makefile" |
The algorithm works by scanning the filenames in a directory. Each filename (prepended by the name of the current directory) is matched against the pattern. If pattern matches the entire filename, we have found a match. Otherwise, the regular expression matcher has been modified to signal two kinds of failure: A prefix-partial-match happens when the matcher runs out of characters in the candidate. This means that the candidate is not a valid match, but it might be a prefix of a valid match. In that case, if the candidate names a directory, we continue recursively scanning that directory. Other kinds of match failure tell us to give up (with this particular file).
Note that the above example runs 2-3 times faster than GNU find on:
find . -regex "dld-.*Makefile" -print |
.
, while Q's glob
only looks at sub-directories matching dld-*
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The function globlist
does "shell-style" globbing,
using the routine glob
. It takes a vector
of strings, does tilde-expansion, and calls glob
on each pattern.
Any empty result from glob
is replaced by a one-element vector containing the
original pattern, but with quotes and parentheses removed.
To find every Makefile
in any sub-directory of dld-*
do:
glob "dld-*/*(*/)Makefile" |
.
, while Q's glob
only looks at sub-directories matching dld-*
.
(See full paper for Q's algorithm.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
globlist
takes a vector
of strings, does tilde-expansion, and calls glob
on each pattern.
Where there is no match, the answer
is replaced by the original pattern, but with quotes and parentheses removed.
The sub-answers are concatenated to one vector.
globlist (quote x(3+4)y p"ar"se* f(oo).*) |
["x3+4y" "parserule.o" "parsemacros.o" "parse.o" "foo.*"] |
Here is a no-frills implementation of echo
:
Q1> :(macro echo :X@)= parse "__echo (quote " X@ ")" Q2> :(__echo :L)= sprintf "%{%s%^ %}\n" (globlist L) Q3> echo parse* parserule.o parsemacros.o parse.o |
__echo
does the actual work: It calls
globlist
to do globbing, and then concatenates
the results together using the sprintf
routine.
(The "%{...%}"
format directives loop over a sequence,
just like Common Lisp's ~{...~}
directives.)
In Unix, the shell traditionally does globbing. This is
usually convenient, but sometimes the standard
expansion is inappropriate, such as the patterns used by
grep
and find
.
Non-Unix systems may provide globbing under application control.
This provides more flexibility. The Q approach provides
the same flexibility in a Unix framework.
As an example, consider ren
, an intelligent
(and simplified) mv
:
:(__ren :src :dst)=( :X=(glob src) {run mv $(X?) $(X? match $src $dst)} do) :(macro ren :args@) = parse "__ren (quote " args@ ")@" |
__ren
routine takes two patterns. It first finds the
filenames matching the first pattern. Then, for each match,
it calls rename
(interface to the
system call), using the matching filename X?
and the new name X? match $src $dst
.
The ren
macro allows you to write:
ren *.c.BAK BAK/*.c |
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |