Beyond scripting languages: Choosing and designing languages

Per Bothner

Table of Contents

1. Why are scripting languages popular?
2. Problems with scripting languages
3. Best of both implementation: Implicit compilation
4. 60-second Scheme introduction
5. Required declarations, optional typing
6. Nesting expressions
7. Extensible syntax
8. Getting and setting object properties
9. Object construction
10. Text from templates
11. Creating XML and HTML
12. Conclusion

Abstract

We will compare "scripting languages" against more mainstream "programming languages", discussing their advantages and disadvantages; discuss how we can get the best of both; and well as some desirable aspects/features of such languages. While the examples will mostly focus on Kawa, the main focus will be general issues in designing or selecting a scripting language.

This is a draft. The latest version of this paper will be on here. Please let <per@bothner.com> know if you find any errors.

1. Why are scripting languages popular?

Scripting languages have received a lot of attention recently, though distinguishing a "scripting language" from a more mainstream "programming languages" isn't easy. Here are some reasons people prefer so-called scripting languages:

No compilation step needed. Traditional programming languages require you to compile your program to some executable form before you can actually execute it. If you're debugging an application it can be tedious to repeatedly edit, re-compile, re-load, and re-test, though a good IDE (Integrated Development Environment) can make this much easier.
Easier to write, to read, and to learn. As a rough generalization, "scripting" languages have a simpler and less verbose syntax.
Simpler for trivial programs. In many non-scripting languages a program consists of classes or modules. For example, in Java even "Hello World" requires defining at least one class containing at least one method.
Don't need to declare variables.
Don't need to specify types of variables.

2. Problems with scripting languages

There are also some disadvantages with most popular scripting languages:

Poor performance. Many scripting languages are implemented using a simple interpreter instead of a compiler. Sometimes poor language design choices make it needlessly difficult for a compiler to generate reasonable code.
Errors caught at run-time rather than compile-time. Some languages are so "friendly" that almost anything is a valid program. This seems nice when dashing off a quick program, but makes it harder to debug. Consider standard regular expression syntax: This is basically a very terse special-purpose programming language. But the cost of the terseness is that the only way to write and debug a "script" is through trial and error - the compiler catches almost no errors for you.
A focus on easy-to-write rather than easy-to-read or easy-to-maintain. Even a simple script will take much more time reading and debugging than actually typing it in. Thus readability and ability to catch errors are very important. A compact high-level language is also easier to read and debug. However, note that the number of tokens is more important than the number of characters.
Ad-hoc design and syntax. Many languages are designed by people without wide knowledge of the research or non-traditional languages, and it shows. A common mistake is to not do lexical scoping properly.

3. Best of both implementation: Implicit compilation

Keep compilability and static error-checking in mind when designing a language.
Implement using a compiler rather than an interpreter. This keeps the language designer and implementer honest.
Catch whatever errors you can at compile-time.
However, hide the compiler from non-expert users. When a file is “executed”, compile it first. Likewise when a user types in an expression. Note that when a function definition is “executed”, the result is a compiled function, so calling the function calls compiled code.
The Kawa compiler toolkit make it relatively easy to write a compiler that runs fast and generates efficient bytecode.
The Java class-loader lets the newly-generated code be loaded into a running JVM without needing any temporary files. This mechanism is efficient and portable.
Speed similar to Java. The result is that Kawa code runs about as fast as Java code. This is much faster than most scripting language implementations.
Standard security and execution environment of Java platform. Using the same execution engine as Java programs has other benefits: You can manage security in the same ways as you would for Java applications, since you're using the same execution stack and class-loader framework. Stack traces are integrated, showing both Java methods and “scripting” methods (e.g. Scheme functions), with line numbers. One can use standard Java debuggers (though some language-specific customization makes it easier). One can use same management, instrumentation, and deployment tools as for Java.

4. 60-second Scheme introduction

We will use the Kawa implementation of the Scheme language in our examples. The main idea to understand is that in Scheme the function/operator is always first, and parentheses surround the function as well as the arguments:

(fun arg1 arg2) instead of fun(arg1, arg2)
(> exp1 exp2) instead of exp1 > exp2
(if c t e) instead of (c ? t : e)
(set! var exp) instead of var = exp;

5. Required declarations, optional typing

Require declarations of variables. Language designers are often tempted to leave out variable declarations, or make them optional, with the goal of making it easier to write programs. This then gets them into trouble with nested scopes; at the very least you need declarations for local variables. Since you have to initialize a variable before you can use it, it is usually easy to declare it in the same place. Explicit declarations makes the program easier to understand, and provides a natural place to document a variable. In Kawa, you can write:

(define counter 0)

This admittedly uses more characters than ideal (Scheme tends to use long unabbreviated names), but it is terse in number of tokens (words) which is more important when it comes to reading programs.

Optional type specification. There is a stronger case for not requiring a type specification when declaring a variable. It is easy to specify that a variable must be an integer, but it is quite a bit harder for complex data structures. Such type systems are still an open area for research. On the other hand, type declarations can make it easier to catch mistakes earlier, help a compiler generate efficient code, and provide useful documentation. So my recommendation is that declarations can have optional type specifiers. For example, in Kawa:

(define counter :: <integer> 0)

Kawa types can be Scheme-level types or Java types. For example:

(define i :: <int> 0)
(define arr :: <java.lang.Object[]> (<java.lang.Object[]> 10))
(set! (arr i) "foo")

Kawa uses the declarations to compile this into efficient bytecode. Such declarations help make Kawa programs run about as fast as Java programs.

6. Nesting expressions

Many languages make a distinction between expressions and statements. Examples include C/C++, Java, PHP, and Python. Much better is to unify these, so all statements are also expressions. This has a number of advantages:

Conditional expressions and conditional statements are the same. For example in Scheme conditional expressions:
```
(set! var (if bool-expression then-exp else-exp))
```
look similar to conditional statements:
```
(if bool-expression
  (set! var then-exp)
  (set! var else-exp))
```

Can freely nest declarations and statements in expressions. For example compare needing a temporary variable in Java:

char next;
if (i < str.length()
    && (next = str.charAt(i)) >= '0' && next <= '9')
  ...;

In Scheme you can use an expression-local declaration:

(if (and (< i (str:length))
         (let ((next (str:charAt i)))
            (and (>= next #\0) (<= next #\9)))))
  ...)

More flexible iteration and mapping forms. In an expression language a classical loop is an expression that is evaluated for its side-effects and returns a void value. But one can also define other iteration forms that may return a sequence value or an accumulation result. Such forms can then be nested inside other expressions without needing temporary variables.
Makes language/syntax extension (macros) work better. Syntax extension is easier if we don't have to treat expressions and statements differently. One reason is that one may want to write a macro to be used in expressions, but whose implementation may use loops or blocks.
Encourages a better side-effect-free style. Many people believe side-effect-free code is better style, easier to understand, and allows more implementation flexibility. A side-effect-free coding style is easier if we can nest blocks and iterative operations inside other expressions.

7. Extensible syntax

Special-purpose syntax is useful for many applications. Common examples include SQL queries, regular expressions, XML/HTML page construction, or various mathematical notations. Sometimes you will want to mix multiple syntaxes in the same program. Hence an extensible syntax is very useful.

Macro invocation should look the same as function calls. You often want to be able to replace a function by a macro or vice versa.

Control structures should look similar to macro invocation. The language isn't really extensible if user-defined operations look different from built-in operations.

Provide a simple general core syntax. More complex control structures can be implemented as extensions.

Avoid having reserved words. Having as few reserved words as possible (preferably none) makes it easier to evolve a language, both over time and to create special-purpose dialects. It also makes the languages easier to learn and remember. “Why can't I create a Java field with the name transient?”

Traditional Lisp/Scheme syntax satisfies these goals well. All function calls, macro invocations, and control structures have the same general structure:

(operator operand1 operand2 ... operandN)

8. Getting and setting object properties

Use field access syntax. Most programming languages support objects that have named properties. Typically properties are implemented as fields, but sometimes one needs to do some computation when reading or writing a property. Because what is implemented as a field one day may need a method calculation the next day, good Java programming style uses private fields accessed using getter and setter methods. But this is bad language design, since using getters and setters is more verbose and harder to read compared to plain field accesses. A solution implemented by many languages, including C#, is to define a property using a getter/setter pair, but access it as a field.

Thus in Java to copy a Swing button's text to another button we do:

button2.setText(button1.getText());

In Kawa we can do:

(set! button2:text button1:text)

Kawa compiles the into exactly the same bytecode as the Java code, assuming button1 and button2 have been declared as JButton variables. If they haven't, Kawa will generate code that uses reflection at run-time. (The compiler has an option to warn when it has to do this.)

Of course you can also use method-call syntax:

(button2:setText (button1:getText))

9. Object construction

Construct new objects using same syntax as factory methods. This allows switching an implementation to use caching, for example. Instead of using a special new reserved word, a language could use make as a conventional static method name. (Explicit object allocation should only be allowed using a private default constructor.) For example, a Java-like language could use this syntax:

Long ten = Long.make(10);

Use class name directly as object construction function. The word new/make isn't really needed: In a language with first-class functions one can treat the class name as the name of the class's constructor function:

(define-alias Long <java.lang.Long>)
(define ten (Long 10))

Here the name Long is an abbreviation for <java.lang.Long>, and when the class is used as a function it is “coerced” to its constructor method. (Kawa finds the constructor at compile-time.)

Keyword parameters are useful for setting properties. A few languages support named (keyword) parameters. These are very useful for creating objects that have many optionally-settable properties. For example in Kawa:

(define-alias JButton <javax.swing.JButton>)
(define my-button
   (JButton icon: my-icon
            text: "my-button"
            tool-tip-text: "click here for action"))

Nested object creation. A more complex example, using a wrapper library:

(Window title: "Hello counter!"
        content:
        (Column
          (Button text: "click here"
                  foreground: 'red
                  action: handle-click)
          count-label))

10. Text from templates

Template languages are awkward for general programming. In a “template language”, such as PHP, JSP, and BRL the main body of a file is literal text, while executable expressions are written inside some escape construct. (Shell languages are similar in that literal text does not need to be quoted.) Such languages are convenient for generating text, but are less suited for general programming. For one, having to start a library of functions with a “ script tags” is just ugly. Typing in expressions in a console is also awkward.

Explicit string quoting more flexible, but can be awkward. Non-template languages instead use string literals combined with string concatenation. This can be awkward in different ways: Embedded (nested) quote symbols need to be escaped, which can get ugly. It is also not visually obvious whether a quotation mark is the beginning or end of a string. It may be good idea to use different symbols for the beginning and end of a string which solves both problems. Unfortunately, if one sticks with ASCII there are few available characters. One option is to use curly braces, since they're not needed for statement blocks if statements are expressions. String concatenation can be made implicit using adjacent expressions. Thus instead of JSP:

The total is: <%= total%>.

one could write:

{The total is: }total{.}

11. Creating XML and HTML

Use markup objects (DOM), not text. Textual templates are limited even for the primary usage of generating web pages because text does not lend itself to further processing. If you need to select data from an XML file, you want to work at the level of elements and attributes (i.e. DOM objects), not the raw XML text. If you have a convenient syntax for creating DOM objects, you might as well use that for output as well: It's just a matter of printing (serializing) the DOM in the preferred format, such as HTML. This reduces the problem of nested quotes, mentioned above. Even better, you don't have to worry about HTML-escaping text (and the security risks if you forget), since this is handled by the serializer.

W3C's new XQuery language follow this model. While most people think of XQuery as a data query language, it also a nice language for XML/HTML generation as well.

Create DOM objects using regular object-creation syntax. There is no need for special XML syntax for creating XML/HTML elements. Instead use regular function call syntax. This makes the language smaller and more consistent, and solves a number of problems. For example creating an element with a computed (non-literal) tag can be handled the same way as calling a computed function. Attributes can be defined using keyword parameters.

You still need a way to refer to the element constructor functions, which doesn't involve explicitly defining a function for each tag. Kawa's solution is to make use of namespace prefixes: You can define a prefix as an “XML namespace”:

(define-xml-namespace svg "http://www.w3.org/2000/svg")

This makes the svg prefix “magic” in that each name in that prefix defines an element constructor function (when first referenced). I.e. you can do:

(svg:g transform: "translate(300 200)"
       (svg:ellipse rx: 250 ry: 100 fill: "red"))

One could add validation to this framework, or static type-checking to catches errors at compile-type. (Neither of these are implemented yet for Kawa.)

Kawa predefines the html prefix:

(html:a href: "news.html" "Click " (html:b "here") " for news.")

12. Conclusion

Some so-called scripting languages may be good languages. However, the idea of scripting languages as distinct from other programming languages is misguided. Instead, we should think about good language design in general, in a way that makes the programmer productive, helps catch errors, and can be implemented efficiently.