Static local variables in Common Lisp

Static local variables are variables with local scope and indefinite extent. That means, they're accessible only within a specific block of instructions, as if they were regular local variables, but otherwise they behave like global variables - their memory is allocated just once, and they exist for as long as they're accessible by the program, retaining their value between subsequent executions of the code that created them. You may recognize this concept from C/C++:

int test_function() {
  static int counter = 0;
  return counter++;
}

//test:
test_function(); // -> 0
test_function(); // -> 1
test_function(); // -> 2

Static local variables are rarely used, and are becoming increasingly absent in modern programming languages. And for good reasons. Such variables share the downsides of global variables - they introduce additional hidden state and make the code dangerous to use in multi-threaded programs. At the same time, they also create another downside: they're lexically scoped to the block of code they're declared in, meaning they're invisible to the outside world. While a global variable could be explicitly made a part of some module's interface, a static local variable can't be; it's invisible.

So why did I seek a way to implement them in Common Lisp? It turns out that despite their drawbacks, they're sometimes very useful. One of such cases is when you want to minimize the number of memory allocations your code does, without impacting its readability. This is important for writing high-performance software, like video games (and this is where I encountered my own need for these variables). Static local variables allow a block of code to preallocate its temporary buffers once, avoiding the penalty of repeated reallocation. Consider:

(defun process-data ()
  (some-level
   (of-processing
    ;;  ...

    (static-let ((query-results (make-array 40000 :element-type 'double-float :fill-pointer 0)))
      (run-a-query "[... some query data ...]" :into query-results)
      (process-results query-results))

    ;; more code
    )))

The STATIC-LET in the code above allocates a QUERY-RESULTS buffer when the function PROCESS-DATA is first executed; subsequent calls to PROCESS-DATA will not allocate this memory again, making the code effectively non-consing (i.e. not allocating memory).

Why do it this way, instead of using a global variable (or perhaps a class-scoped slot when writing a method)? Because it keeps the declaration and initialization close to the code that's using it. The variable is still lexically scoped - code outside of STATIC-LET can't get to it, which partially mitigates the issue of hidden state (in fact, when not used in multi-threaded context, this code would still technically count as functional programming).

What about the magic 40000? Is it wise to allocate buffers of predetermined size? The answer is, in performance-critical code, absolutely. Memory is always finite and most software cannot recover when it runs out. Making buffer size explicit is actually the safer choice, because it forces you to write your code so that it either stays in fixed bounds or gracefully handles memory outage, giving you predictable memory usage patterns.

After this lengthy introduction, let's look at how you can make static local variables in Common Lisp.

The implementation

(defmacro static-let (vars &body body)
  "Like `LET', but makes `VARS' static (in C language sense).

A static variable is like a global lexical variable, but accessible only within the scope
of the `BODY' form, and it retains value between executions of the `BODY' form. Initial
value of each binding is evaluated only once, the first time `BODY' is executed.

The macro supports optional type declarations; instead of usual binding form
`(VAR-NAME INITALIZER)', use `(VAR-NAME TYPE INITALIZER)' to get automatic type declarations.

Be careful with this macro, as it creates a new set of static variables each time it's expanded.
This macro is not thread safe. Also, avoid recursive execution of `BODY'.
If a static variable is initialized or set to `NIL', it will be reinitialized the next time
`BODY' is executed."
  (loop
     for entry in vars
     for name = (if (consp entry) (car entry) entry)
     for type = (when (and (consp entry)
                           (caddr entry))
                  (cadr entry))
     for initform = (when (consp entry)
                      (or (caddr entry)
                          (cadr entry)))
     for sym = (gensym (string name))
     collect `(,sym (load-time-value (list nil))) into letforms
     if type
     collect `(type ,type ,name) into declforms
     collect `(unless (car ,sym) (setf (car ,sym) ,initform)) into initforms
     collect `(,name (car ,sym)) into macroletforms
     finally
       (return
         `(let ,letforms
            ,@initforms
            (symbol-macrolet ,macroletforms
              ,@(if declforms
                    `((declare ,@declforms)))
              ,@body)))))

It's a relatively simple macro, though it may take a bit to parse. Like a proper LET-style macro, it supports defining multiple lexical variables in one block and initialize them with values. As a bonus, the variable form may also look like (NAME TYPE INITIAL-VALUE), which allows me to provide type declarations for the variables in the same place (I tend to like my Lisp thoroughly typed).

Here's an example macroexpansion:

(macroexpand-1 '(static-let ((test-array (make-array 40000 :element-type 'double-float)))
                 (setf (aref test-array 0) 42)))

;;; =>

(LET ((#:TEST-ARRAY730 (LOAD-TIME-VALUE (LIST NIL))))
  (UNLESS (CAR #:TEST-ARRAY730)
    (SETF (CAR #:TEST-ARRAY730)
            (MAKE-ARRAY 40000 :ELEMENT-TYPE 'DOUBLE-FLOAT)))
  (SYMBOL-MACROLET ((TEST-ARRAY (CAR #:TEST-ARRAY730)))
    (SETF (AREF TEST-ARRAY 0) 42)))

The magic which allows to create a static variable is contained in the LOAD-TIME-VALUE form. Quoting the HyperSpec, "load-time-value provides a mechanism for delaying evaluation of form until the expression is in the run-time environment; (…) with the result of this evaluation then being treated as a literal object at run time" (emphasis mine, see full documentation for context).

Key words here are "literal object". Literal objects are usually ones you type directly in the source code, like a "string literal", or '(a quoted form). The details of handling such objects are a bit complicated, but for this purpose, it's sufficient to say that at some point before the code is executed, literal objects are instantiated, and from that point on, any place in the source code that refers to such a literal now refers to the particular instance of it. So, from the point of view of the Lisp runtime, before the above macroexpansion is executed, (LOAD-TIME-VALUE (LIST NIL) has already been replaced by a reference to a particular empty cons cell. This cons cell is our static local variable.

The rest of the macroexpansion executes normally. There's a test that guards the code setting the CAR of the cons cell to the result of initialization form for our static variable, which ensures that the initialization happens the first time control flows through this block of code. Then, there's a SYMBOL-MACROLET that ensures our variable is accessible for both reading and writing under the name given to it in STATIC-LET form (this is the standard Common Lisp idiom for when you want to make a bit of code present itself as if it was just a variable).

Why the empty cons cell, you may ask? Why not put the initialization form into the LOAD-TIME-VALUE directly? The reason is, the evaluation of that form happens well before runtime, so if you compile the file (with a COMPILE-FILE call), the actual result of the initialization form may end up encoded in the compiled file. In our example above, that would be 320KB worth of floats, and in practice this may easily run into hundreds of megabytes of data that will end up stored in the compiled file (or dumped Lisp image). So instead, STATIC-LET only allocates an empty cons cell, a minimal literal object that can store a reference to the data initialize at runtime.

(When delivering executables, one must remember to never call the functions using STATIC-LET before dumping the Lisp image; otherwise, the initialized data will end up being stored in the image file in full form. Also note that this implementation has one caveat: if, at any point, the variable is set to NIL, it will be reinitialized the next time control goes through STATIC-LET. This behavior deviates from what C/C++ does, and it could be corrected with little extra work, but I deemed it not worth it.)

Let's verify STATIC-LET works as described:

(defun test-static-let ()
  (format t "~&Executing test.")
  (static-let ((test-counter (progn (format t "~&Allocating counter.")
                                    0))
               (test-array (progn (format t "~&Allocating array.")
                                  (make-array 40000 :element-type 'double-float))))
    (incf test-counter)
    (incf (aref test-array 0))
    (format t "~&Counter: ~A, array's first element: ~A" test-counter (aref test-array 0)))
  (format t "~&Test done."))

(loop
   repeat 3
   do
     (test-static-let))
Executing test.
Allocating counter.
Allocating array.
Counter: 1, array's first element: 1.0d0
Test done.
Executing test.
Counter: 2, array's first element: 2.0d0
Test done.
Executing test.
Counter: 3, array's first element: 3.0d0
Test done.

Alternative approaches

Hanging the static variable off a cons cell allocated by LOAD-TIME-VALUE is what I found to be the best way to implement it, but it's not the only way. Alternative approaches include:

  • redefining the entire function using static variables as closure, with static variables being regular lexical variables over which that function closes;
  • keeping a global hash table for "static" variables, and using unique symbols (generated by GENSYM) as keys;
  • creating each such variable as DEFVAR, with an unique name generated by GENSYM (an approach taken by with-c-syntax);
  • attaching the variable into the property list of a symbol generated by GENSYM.

All these approaches have certain drawbacks. The closure approach requires change to the very function definition (a macro can't just hoist the code it generates into the surrounding context). Global hash table, magic defvars and property lists all are technically accessible from the outside, and come with small performance penalties. Overall, despite the caveat with NIL values (which is entirely due to my laziness thoughtful engineering decision), I prefer the LOAD-TIME-VALUE approach.