64Digits

What's Lisp?

Lisp is a functional programming language with an extremely simple syntax (based on S-expressions) that's quite easy to implement. (Here's a Ruby implementation in about 50 lines I made last year.)

However, since GML doesn't natively support data structures than can contain strings, things get a lot harder: every S-exp must be represented as a string, and parsing them involves a bunch of messy string operations. Obviously, this isn't something you'd normally want to do in any programming language, but I thought it was a cute idea so I did it anyway. This implementation (like that Ruby one) is based on Peter Norvig's "lispy", but instead of parsing the code into tokens before evaluating them, everything happens in the strings themselves.

List operations

First of all, I wrote definitions for car, cdr and cons. Here's my car script:

// car("(a b c)") -> "a"

var list, i, left, right;

list = strip(argument0);

if (string_char_at(list, 1) != '(')
    show_error("syntax error: car() expects a list", true)

i = 2;

while (string_char_at(list, i) == ' ') i += 1;

left = i;
if (string_char_at(list, i) == '(') {
    i += 1;
    nest = 0;
    while (nest >= 0) {
        if (string_char_at(list, i) == '(') nest += 1;
        if (string_char_at(list, i) == ')') nest -= 1;
        i += 1;
    }
    right = i;
} else {
    i += 1;
    while (string_char_at(list, i) != '('
        && string_char_at(list, i) != ')'
        && string_char_at(list, i) != ' ') i += 1;
    right = i;
}

return string_copy(argument0, left, right - left);

First, this calls strip on argument0, which removes spaces from the start and the end of the string, in order to extract valid results from S-exps like " (a b) ". Then, it makes sure it was passed a list, by checking if the first character in the string is a (. It then skips over any spaces that might follow the first left paren. Now, we've found the index of the start of our first token. If it's a list, we look for the matching ); otherwise, we look for the first non-token character. This is the index of the end of the first token. The substring between these indices is returned.

cdr works pretty much the same way, except after having found the end of the first token, it cuts off everything before that and replaces it with (. Here's the code:

// cdr("(a b c)") -> "(b c)"

var list, i, right;

list = strip(argument0);

if (string_char_at(list, 1) != '(')
    show_error("syntax error: cdr() expects a list", true)

i = 2;

while (string_char_at(list, i) == ' ') i += 1;

if (string_char_at(list, i) == '(') {
    i += 1;
    nest = 0;
    while (nest >= 0) {
        if (string_char_at(list, i) == '(') nest += 1;
        if (string_char_at(list, i) == ')') nest -= 1;
        i += 1;
    }
    right = i;
} else {
    i += 1;
    while (string_char_at(list, i) != '('
        && string_char_at(list, i) != ')'
        && string_char_at(list, i) != ' ') i += 1;
    right = i;
}

return "(" + strip(string_copy(list, right, string_length(list) - right + 1));

Now, we need a way to construct new lists easily: the cons script, which prepends an element to a list. Its implementation is simple: strip the string, then insert a new head element after the first left paren.

// cons("a", "(b c d)") -> "(a b c d)"

return "(" + argument0 + " " + string_delete(strip(argument1), 1, 1);

I also ended up needing a function that determines whether or not a list is empty. Guess what I called it! Right. Basically, it just checks if the first ( is followed immediately by a ), ignoring spaces:

// empty(list) -> is list empty?

var code, i;
code = strip(argument0);
i = 2;
while (string_char_at(argument0, i) == ' ') i += 1;
return (string_char_at(argument0, i) == ')');

Environments

Now that lists were working fine, I needed to implement environments. Basically, they're lists of key-value pairs, with an extra element that points to the ID of its "parent environment", to be searched through whenever a lookup fails. All environments are stored in a global array, and are created/destroyed when evaluating lambda blocks. I represented them as lisp arrays, like this: (parent (k0 v0) (k1 v1) .. (kn vn))

To define the global environment, together with the environment array, I wrote a little init_lisp script, containing just this code:

global.envs = 1
global.env[0] = "(-1)"

It's empty at the start of the program, and its parent id is -1, because, well, there's nothing above the global environment.

First of all, I needed a script that returned the matching value for a given key, or "" if none was found. I called it assoc. How Lispy.

// assoc(env id, key) -> value

var list;
list = cdr(global.env[argument0]);

while (!empty(list)) {
    if (car(car(list)) == argument1)
        return car(cdr(car(list)));
    list = cdr(list);
}

return "";

The first cdr call cuts off the parent id; then, it just cdrs down the list looking for the key.

The contains script is just a small wrapper around assoc that compares the result to "", for readability:

// contains(env id, key) -> is key directly in env?

return (assoc(argument0, argument1) != "");

The set_value script prepends a new key-value pair to an environment, without paying attention to the old value: the new key will be placed at the front of the list, so the new value will be found first by assoc.

// set_value(env id, key, value) -> new env

var list;
list = global.env[argument0];

var new_pair;
new_pair = cons(argument1, cons(argument2, "()"));

global.env[argument0] = cons(car(list), cons(new_pair, cdr(list)));

Finally, there's the core of this whole environment code: get_env. Given an environment id and a key, it looks for the nearest parent of this environment that directly contains the given key. When it finds -1, that means it failed to find the key in the global environment, so the variable must be undefined.

// get_env(env id, key) -> id of nearest parent env containing key

var i;

while (!contains(argument0, argument1)) {
    argument0 = real(car(global.env[argument0]));
    if (argument0 == -1) show_error("Undefined variable: " + argument1, true);
}

return argument0;

Writing eval

Time for the exciting part! The eval script has the most important job of all: given a Lisp expression and an environment, evaluate it within this environment and return the resulting value. I'll explain this one line by line in the code comments.

// eval(lisp obj, env id) -> lisp obj 

var code, env;
code = strip(argument0);
env = argument1;

// If the code doesn't start with a '(', it's either a number or a symbol.
// If it's a number, just return the number itself. Otherwise, look for the
// environment that contains the symbol and return the matching value.
if (string_char_at(code, 1) != '(') {
    if (string_pos(string_char_at(code, 1), "-.0123456789"))
        return string(real(code));
    return assoc(get_env(env, code), code);
} else {
    // If it *is* a '(', we're dealing with some kind of function call.
    // First, split the list into the function and its arguments.
    var func, args;
    func = car(code);
    args = cdr(code);
    switch (func) {
        // base syntax
        case "quote":
            // `quote` just returns the argument passed to it.
            return car(args);
        case "if":
            // `if` evalutates its first argument, and then returns either
            // its second or third argument, evaluated, depending on the
            // truth value of the first result.
            var cond; cond = eval(car(args), env);
            if (real(cond)) return eval(car(cdr(args)), env);
            return eval(car(cdr(cdr(args))), env);
        case "set!":
            // `(set! k v)` changes some previously defined variable.
            var key; key = car(args);
            set_value(get_env(env, key), key, eval(car(cdr(args)), env));
            return 0;
        case "define":
            // `(define k v)` defines a new value.
            set_value(env, car(args), eval(car(cdr(args)), env));
            return 0;
        case "lambda":
            // `lambda` blocks, when evaluated, just return their entire
            // code. We handle them later in the default case.
            return cons("lambda", args);
        case "begin":
            // `begin` evaluates a bunch of expressions in order,
            // returning the result of the last one. Basically, it executes
            // a Lisp program, which is just a list of expressions.
            var value;
            while (!empty(args)) {
                value = eval(car(args), env);
                args = cdr(args);
            }
            return value;
        case "list":
            // `list` returns a list containing all of its arguments. This
            // implementation is the worst thing in the universe.
            var rev_list; rev_list = "()";
            while (!empty(args)) {
                rev_list = cons(eval(car(args), env), rev_list);
                args = cdr(args);
            }
            var new_list; new_list = "()";
            while (!empty(rev_list)) {
                new_list = cons(car(rev_list), new_list);
                rev_list = cdr(rev_list);
            }
            return new_list;
        
        // Arithmetic is mostly trivial, but + and * should accept any amount
        // of arguments.
        case "+":
            var accum; accum = 0;
            while (!empty(args)) {
                accum += real(eval(car(args), env));
                args = cdr(args);
            }
            return string(accum);
        case "*":
            var accum; accum = 1;
            while (!empty(args)) {
                accum *= real(eval(car(args), env));
                args = cdr(args);
            }
            return string(accum);
        case "-":
            return string(real(eval(car(args), env)) -
                          real(eval(car(cdr(args)), env)));
        case "/":
            return string(real(eval(car(args), env)) /
                          real(eval(car(cdr(args)), env)));
        case "mod":
            return string(real(eval(car(args), env)) mod
                          real(eval(car(cdr(args)), env)));
        case "=":
            return string(real(eval(car(args), env)) ==
                          real(eval(car(cdr(args)), env)));
        case "<":
            return string(real(eval(car(args), env)) <
                          real(eval(car(cdr(args)), env)));
        case ">":
            return string(real(eval(car(args), env)) >
                          real(eval(car(cdr(args)), env)));
        case "<=":
            return string(real(eval(car(args), env)) <=
                          real(eval(car(cdr(args)), env)));
        case ">=":
            return string(real(eval(car(args), env)) >=
                          real(eval(car(cdr(args)), env)));
        
        // Some boring list functions.
        case "car":
            return car(eval(car(args), env));
        case "cdr":
            return cdr(eval(car(args), env));
        case "cons":
            return cons(eval(car(args), env),
                        eval(car(cdr(args)), env));
        case "empty?":
            return string(empty(eval(car(args), env)));
            
        // A more interesting list function: `map` applies a function to
        // every element in a list, then returns the new list. For example,
        // (map (lambda (x) (* x x)) (list 1 2 3)) returns (list 1 4 9).
        case "map":
            var f; f = eval(car(args), env);
            var xs; xs = eval(car(cdr(args)), env);
            
            var rev_list; rev_list = "()";
            while (!empty(xs)) {
                var fx; fx = cons(f, cons(car(x), "()"));
                rev_list = cons(eval(fxs, env), rev_list);
                xs = cdr(xs);
            }
            var new_list; new_list = "()";
            while (!empty(rev_list)) {
                new_list = cons(car(rev_list), new_list);
                rev_list = cdr(rev_list);
            }
            return new_list;
        
        // func isn't a built-in keyword, so it must be a lambda.
        default:
            func = eval(func, env);
            
            // func now looks like this:
            // "(lambda  (x y)    (* x y y))"
            //   ^ car   ^ l_args ^ l_body
            
            // If this somehow isn't a function, something has gone wrong.
            if (car(func) != "lambda")
                show_error("Invalid function?", true);
            
            // Otherwise, extract the argument variables and the body
            // expression.
            var l_args; l_args = car(cdr(func));
            var l_body; l_body = car(cdr(cdr(func)));
            
            // Zip symbols and evaluated arguments together, then make a new
            // environment out of them, in which we will evaluate the body.
            var sub_env_list; sub_env_list = "()";
            var new_pair;
            while (!empty(args)) {
                new_pair = cons(car(l_args), cons(eval(car(args), env), "()"));
                sub_env_list = cons(new_pair, sub_env_list);
                l_args = cdr(l_args);
                args = cdr(args);
            }
            
            // Let the current environment be the new one's parent.
            sub_env_list = cons(string(env), sub_env_list);
            
            // Allocate a new environment id, evalute the body, then free it.
            sub_env = global.envs;
            global.envs += 1;
            global.env[sub_env] = sub_env_list;
            
            var l_result; l_result = eval(l_body, sub_env);
            global.envs -= 1;
            
            return l_result;
    }
}

Finally, eval can evaluate a Lisp program using begin, like this:

eval("(begin (define foo 31) (define sqr (lambda (x) (* x x))) (sqr foo))") == "961"

Here's the .gmk file (yeah, 8.0) if you want to mess around with it: https://dl.dropbox.com/u/15495351/lisp.gmk

You could run it in debug mode and evaluate expressions manually, or program your own little read-eval-print loop, I dunno. Have fun with it.

Astryl 12 years, 2 months ago

I label this as completely impractical yet awesome. Nice job.

Personally, I was thinking of creating a DLL based Prolog interpreter, just for kicks.

sirxemic 12 years, 2 months ago

~~Could you get the sidebars back? Else I have to do it by force.~~

Done.

F1ak3r 12 years, 2 months ago

Heh, reminds me of Lists and Lists. Dat functional programming.

Lisp in GML

Comments