Shell-style programming in Kawa
A number of programming languages have facilities to allow access to system processes (commands). For example Java hasjava.lang.Process
and java.lang.ProcessBuilder
.
However, they're not as convenient as
as the old Bourne shell, or as elegant in composing commands.
If we ignore syntax, the shell's basic model is that a command is a function that takes an input string (standard input) along with some string-valued command-line arguments, and whose primary result is a string (standard output). The command also has some secondary outputs (including standard error, and the exit code). However, the elegance comes from treating standard output as a string that can be passed to another command, or used in other contexts that require a string (e.g. command substitution). This article presents how Kawa solves this problem.
Process is auto-convertable to string
To run a command like date
,
you can use the run-process
procedure:
#|kawa:2|# (define p1 (run-process "date --utc"))Equivalently you can use the
&`{command}
syntactic sugar:
#|kawa:2|# (define p1 &`{date --utc})
But what kind of value is p1
?
It's an instance of gnu.kawa.functions.LProcess
,
which is a class that extends java.lang.Process
.
You can see this if you invoke toString
or call the
write
procedure:
#|kawa:2|# (p1:toString) gnu.kawa.functions.LProcess@377dca04 #|kawa:3|# (write p1) gnu.kawa.functions.LProcess@377dca04
An LProcess
is automatically converted to
a string or a bytevector in a context that requires it.
More precisely, an LProcess
can be converted to a blob
because it implements Lazy<Blob>
.
This means you can convert to a string (or bytevector):
#|kawa:9|# (define s1 ::string p1) #|kawa:10|# (write s1) "Wed Jan 1 01:18:21 UTC 2014\n" #|kawa:11|# (define b1 ::bytevector p1) (write b1) #u8(87 101 100 32 74 97 110 ... 52 10)
However the display
procedure prints it in "human" form,
as a string:
#|kawa:4|# (display p1) Wed Jan 1 01:18:21 UTC 2014
This is also the default REPL formatting:
#|kawa:5|# &`{date --utc} Wed Jan 1 01:18:22 UTC 2014
Command arguments
The general form for run-process
is:
(run-process keyword-argument... command)
The command is the process command-line. It can be an array of strings, in which case those are used as the command arguments directly:
(run-process ["ls" "-l"])The command can also be a single string, which is split (tokenized) into command arguments separated by whitespace. Quotation groups words together just like traditional shells:
(run-process "cmd a\"b 'c\"d k'l m\"n'o") ⇒ (run-process ["cmd" "ab 'cd" "k'l m\"no"])
Using string templates is more readable as it avoids having to quote quotation marks:
(run-process &{cmd a"b 'c"d k'l m"n'o})You can also use the abbreviated form:
&`{cmd a"b 'c"d k'l m"n'o}This syntax is the same as of SRFI-108 named quasi-literals. In general, the following are roughly equivalent (the difference is that the former does smart quoting of embedded expressions, as discussed later):
&`{command} (run-command &{command})
Similarly, the following are also roughly equivalent:
&`[keyword-argument...]{command} (run-command keyword-argument... &{command})
A keyword-argument can specify various properties of the process. For example you can specify the working directory of the process:
(run-process directory: "/tmp" "ls -l")You can use the
shell
keyword to
specify that we want to use the shell to
split the string into arguments.
For example:
(run-process shell: #t "command line")is equivalent to:
(run-process ["/bin/sh" "-c" "command line"])You can can also use the abbreviation
&sh
:
&sh{rm *.class}which is equivalent to:
&`{/bin/sh -c "rm *.class"}
In general, the abbreviated syntax:
&sh[args...]{command}is equivalent to:
&`[shell: #t args...]{command}
Command and variable substitution
Traditional shells allow you to insert the output from a command into the command arguments of another command. For example:echo The directory is: `pwd`The equivalent Kawa syntax is:
&`{echo The directory is: &`{pwd}}
This is just a special case of substituting the result from evaluating an expression. The above is a short-hand for:
&`{echo The directory is: &[&`{pwd}]}
In general, the syntax:
...&[expression]...evaluates the
expression
,
converts the result to a string, and combines it with the literal string.
(We'll see the details in the next section.)
This general form subsumes command substitution,
variable substitution, and arithmetic expansion.
Tokenization of substitution result
Things gets more interesting when considering the interaction between substitution and tokenization. This is not simple string interpolation. For example, if an interpolated value contains a quote character, we want to treat it as a literal quote, rather than a token delimiter. This matches the behavior of traditional shells. There are multiple cases, depending on whether the interpolation result is a string or a vector/list, and depending on whether the interpolation is inside a quotes.
- If the value is a string, and we're not inside
quotes, then all non-whitespace characters (including quotes) are literal,
but whitespace still separates tokens:
(define v1 "a b'c ") &`{cmd x y&[v1]z} ⟾ (run-process ["cmd" "x" "ya" "b'c" "z"])
- If the value is a string, and we are inside
single quotes, all characters (including whitespace) are literal.
&`{cmd 'x y&[v1]z'} ⟾ (run-process ["cmd" "x ya b'c z"])
Double quotes work the same except that newline is an argument separator. This is useful when you have one filename per line, and the filenames may contain spaces, as in the output from
find
:&`{ls -l "&`{find . -name '*.pdf'}"}
If the string ends with one or more newlines, those are ignored. This rule (which also applies in the previous not-inside-quotes case) matches traditional shell behavior.
- If the value is a vector or list (of strings), and we're
not inside quotes, then each element of the array becomes its own argument,
as-is:
(define v2 ["a b" "c\"d"]) &`{cmd &[v2]} ⟾ (run-process ["cmd" "a b" "c\"d"])
However, if the enclosed expression is adjacent to non-space non-quote characters, those are prepended to the first element, or appended to the last element, respectively.&`{cmd x&[v2]y} ⟾ (run-process ["cmd" "xa b" "c\"dy"]) &`{cmd x&[[]]y} ⟾ (run-process ["cmd" "xy"])
This behavior is similar to how shells handle"$@"
(or"${name[@]}"
for general arrays), though in Kawa you would leave off the quotes.Note the equivalence:
&`{&[array]} ⟾ (run-process array)
-
If the value is a vector or list (of strings), and
we are inside quotes, it is equivalent to interpolating
a single string resulting from concatenating the elements
separated by a space:
&`{cmd "&[v2]"} ⟾ (run-process ["cmd" "a b c\"d"])
This behavior is similar to how shells handle
"$*"
(or"${name[*]}"
for general arrays). -
If the value is the result of a call to
unescaped-data
then it is parsed as if it were literal. For example a quote in the unescaped data may match a quote in the literal:(define vu (unescaped-data "b ' c d '")) &`{cmd 'a &[vu]z'} ⟾ (run-process ["cmd" "a b " "c" "d" "z"])
-
If we're using a shell to tokenize the command,
then we add quotes or backslashes as needed so that
the shell will tokenize as described above:
&sh{cmd x y&[v1]z} ⟾ (run-process ["/bin/sh" "-c" "cmd x y'a' 'b'\\'''c' z'"])
&`{command}
.
You can of course use string templates with run-process
:
(run-process &{echo The directory is: &`{pwd}})However, in that case there is no smart tokenization: The template is evaluated to a string, and then the resulting string is tokenized, with no knowledge of where expressions were substituted.
Input/output redirection
You can use various keyword arguments to
specify standard input, output, and error streams. For example
to lower-case the text in in.txt
, writing
the result to out.txt
, you can do:
&`[in-from: "in.txt" out-to: "out.txt"]{tr A-Z a-z}or:
(run-process in-from: "in.txt" out-to: "out.txt" "tr A-Z a-z")
These options are supported:
in: value
- The
value
is evaluated, converted to a string (as if usingdisplay
), and copied to the input file of the process. The following are equivalent:&`[in: "text\n"]{command} &`[in: &`{echo "text"}]{command}
You can pipe the output from
command1
to the input ofcommand2
as follows:&`[in: &`{command1}]{command2}
in-from:
path- The process reads its input from the specified path,
which can be any value coercible to a
filepath
. out-to:
path- The process writes its output to the specified path.
err-to:
path- Similarly for the error stream.
out-append-to:
patherr-append-to:
path-
Similar to
out-to
anderr-to
, but append to the file specified by path, instead of replacing it. in-from: 'pipe
out-to: 'pipe
err-to: 'pipe
- Does not set up redirection.
Instead, the specified stream is available using the methods
getOutputStream
,getInputStream
, orgetErrorStream
, respectively, on the resultingProcess
object, just like Java'sProcessBuilder.Redirect.PIPE
. in-from: 'inherit
out-to: 'inherit
err-to: 'inherit
- Inherits the standard input, output, or error stream from the current JVM process.
out-to: port
err-to: port
- Redirects the standard output or error of the process
to the specified
port
. out-to: 'current
err-to: 'current
- Same as
out-to: (current-output-port)
, orerr-to: (current-error-port)
, respectively. in-from: port
in-from: 'current
- Re-directs standard input to read from the
port (or
(current-input-port)
). It is unspecified how much is read from the port. (The implementation is to use a thread that reads from the port, and sends it to the process, so it might read to the end of the port, even if the process doesn't read it all.) err-to: 'out
- Redirect the standard error of the process to be merged with the standard output.
The default for the error stream (if neither err-to
or err-append-to
is specifier) is equivalent
to err-to: 'current
.
Note:
Writing to a port is implemented by copying
the output or error stream of the process.
This is done in a thread, which means we don't have any
guarantees when the copying is finished.
A possible approach is to have to process-exit-wait
(discussed later) wait for not only the process to
finish, but also for these helper threads to finish.
Here
documents
A here document
is a form a literal string, typically multi-line, and
commonly used in shells for the standard input of a process.
Kawa's string literals or string quasi-literals can be used for this.
For example, this passes the string "line1\nline2\nline3\n"
to the standard input of command
:
(run-process [in: &{ &|line1 &|line2 &|line3 }] "command")
The &{...}
delimits a string;
the &|
indicates the preceding indentation is ignored.
Pipe-lines
Writing a multi-stage pipe-line quickly gets ugly:
&`[in: &`[in: "My text\n"]{tr a-z A-Z}]{wc}
Aside: This would be nicer in a language with in-fix operators, assuming&`
is treated as a left-associative infix operator, with the input as the operational left operand:"My text\n" &`{tr a-z A-Z} &`{wc}
The convenience macro pipe-process
makes this much nicer:
(pipe-process "My text\n" &`{tr a-z A-Z} &`{wc})
All but the first sub-expressions must be (optionally-sugared)
run-process
forms.
The first sub-expression is an arbitrary expression
which becomes the input to the second process expression;
which becomes the input to the third process expression;
and so on. The result of the pipe-process
call is the result of the last sub-expression.
Copying the output of one process to the input of the next is optimized: it uses a copying loop in a separate thread. Thus you can safely pipe long-running processes that produce huge output. This isn't quite as efficient as using an operating system pipe, but is portable and works pretty well.
Setting the process environment
By default the new process inherits the system environment of the current (JVM) process as returned bySystem.getenv()
,
but you can override it.
env-name: value
-
In the process environment, set the
"name"
to the specifiedvalue
. For example:&`[env-CLASSPATH: ".:classes"]{java MyClass}
NAME: value
-
Same as using the
env-name
option, but only if theNAME
is uppercase (i.e. if uppercasingNAME
yields the same string). For example the previous example could be written:(run-process CLASSPATH: ".:classes" "java MyClass")
environment: env
-
The
env
is evaluated and must yield aHashMap
. This map is used as the system environment of the process.
Process-based control flow
Traditional shell provides logic control flow operations based on the exit code of a process: 0 is success (true), while non-zero is failure (false). Thus you might see:
if grep Version Makefile >/dev/null then echo found Version else echo no Version fi
One idea to have a process be auto-convertible to a boolean,
in addition to be auto-convertible to strings or bytevectors:
In a boolean context, we'd wait for the process to finish,
and return #t
if the exit code is 0,
and #f
otherwise. This idea may be worth exploring later.
Currently Kawa provides process-exit-wait
which waits for a process to exit, and then returns the
exit code as an int
.
The convenience function process-exit-ok?
returns true iff process-exit-wait
returns 0.
(process-exit-wait (run-process "echo foo")) ⟾ 0The previous
sh
example could be written:
(if (process-exit-ok? &`{grep Version Makefile}) &`{echo found} &`{echo not found})Note that unlike the
sh
, this ignores the
output from the grep
(because no-one has asked for it).
To match the output from the shell, you can use out-to: 'inherit
:
(if (process-exit-ok? &`[out-to: 'inherit]{grep Version Makefile}) &`{echo found} &`{echo not found})