We were clever enough to turn a laundry list into poetry.

— Umberto Eco, Foucault’s Pendulum

2014-01-20 Joe Armstrong's Universal Server in Various Languages

Joe Armstrong’s favorite Erlang program is a pretty fun one to transcribe.

The original, in Erlang, looks like this:

universal_server() ->
   receive
      {become, F} ->
          F()
   end.

In Scheme, using Termite:

(define (universal-server)
  (spawn
   (lambda ()
     (recv (('become f)
            (f))))))

In Go, using channels1:

type Message interface {}

type Become struct {
  f func()
}

func universalServer() chan Message {
  c := make(chan Message)
  go func() {
    switch v := <- c; v.(type) {
      case Become:
        close(c)
        v.(Become).f()
    }
  }()
  return c
}

In Lua, using ConcurrentLua2:

concurrent = require 'concurrent'

-- Expects messages of the form { t = "become", f = function () ... end }
universal_server = function ()
  return concurrent.spawn(function ()
    local msg = concurrent.receive()
    if msg.t == "become" then
      msg.f()
    else
      -- drop
    end
  end)
end

In Scala, using Akka:

case class Become(f : () => _)

class UniversalServer extends Actor {
  def receive = {
    case Become(f) => f()
  }
}

  1. This is slightly off, since it isn’t networked (but you can just pretend c is a net.Read sink), and universalServer receives from c unconditionally. The channel should really be buffered, with universalServer ignoring non-Become Messages.

  2. Again, the server should ignore messages where msg.t != "become".

2014-01-08 Assorted mail scripts

Use Vim as a colorizing pager:

#!/bin/sh
fold -s \
  | vim -R - \
    -c "sil so /usr/local/share/vim/vim73/macros/less.vim" \
    -c "sil set ft=mail" \
    -c "sil set shm+=T"

Safely bogofilter an online mbox:

#!/bin/ksh

set -o monitor

tmp=$(mktemp -q) && {
  /usr/libexec/lockspool |&
  read -p ok
  [ $ok = 1 ] && {
    bogofilter -pM < $MAIL > $tmp
    [ $? -lt 3 ] && cat $tmp > $MAIL || e=1
  }
  kill %?lockspool
  wait %?lockspool
  rm $tmp
  exit $e
}

Check for mail in remote mailboxes (assuming each $host has an heirloom mailx-compatible nail(1) installed):

#!/bin/sh
tput civis
for host in $*; do
  printf "%-16s " $host:
  ssh $host nail -e && echo mail || echo no mail
done
tput cnorm

Other useful programs:

2013-10-11 Asymmetric encryption for s3cmd

s3cmd(1) supports automatic encryption & decryption of data, but it defaults to using a symmetric cypher and passphrase. Happily, it’s straightforward to configure it to use a public key instead, for those who are so inclined.

The important directives in $HOME/.s3cfg are:

gpg_decrypt = %(gpg_command)s -d --verbose --use-agent --batch --yes -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -e --verbose --use-agent --batch --yes -R <key-id> -o %(output_file)s %(input_file)s
gpg_passphrase = Ignored

Replace <key-id> with yours.

Note that gpg_passphrase is required but unused (s3cmd will whine and quit if it’s unset).

(If you don’t have a $HOME/.s3cfg, create it with s3cmd --configure.)

2013-09-10 The limits of size_t and ptrdiff_t

Given a block of memory, you can probably express its size as a value of type size_t. This isn’t actually guaranteed – size_t is only required to hold values 65535 or greater, given by SIZE_MAX – but it’s pretty safe to assume since AFAIK most everybody defines size_t according to the machine’s word length.

However, given two pointers, it’s not nearly as certain that you can express their difference with a ptrdiff_t. Because ptrdiff_t is signed (in order to indicate the direction of the difference), its maximum value is SIZE_MAX / 2 (actually PTRDIFF_MAX, also from stdint.h, although the two values should be equal), with undefined behavior if the result of the subtraction doesn’t fit in its range. This strikes me as a far more likely limitation to run up against.

2013-07-16 Compiling statically-linked CHICKEN Scheme programs with extensions

While static compilation with extensions isn’t officially supported by CHICKEN’s toolchain (chicken-install(1) et al.), it’s usually possible. However, doing so means compiling the extension and all of its dependencies yourself.

This page gives a simple example of this. The general strategy is to:

  1. Fetch each extension’s source.
  2. Compile each extension into units and import files.
  3. Compile your program in the presence of those import files.
  4. Link your program together with those units.

All of the information here is available in more detail in the CHICKEN manual – I’ve just extracted the relevant parts and added a narrative.


Our goal is to produce a statically-linked program that uses the uri-generic extension, which itself uses the matchable and defstruct extensions.

We’ll start with matchable. Fetch its source with chicken-install -retrieve.

$ chicken-install -retrieve matchable > /dev/null
$ ls matchable
matchable.meta
matchable.scm
matchable.setup
matchable-test.scm
match-simple.scm

Each extension includes several files we’re not interested in1, and then one – if we’re lucky – primary source file. For this extension, that’s matchable.scm.

To use this file statically, we compile it into an ELF object as a “unit”, then link that into our final program.

$ cat program.scm; echo
(use matchable)
(match '(1 . 2)
  ((x . y)
   (display "Hooray!")
   (newline)))

$ csc -unit matchable -emit-import-library matchable -c matchable/matchable.scm -o matchable.o
$ ls matchable.*
matchable.import.scm
matchable.o
$ csc -uses matchable -c program.scm -o program.o
$ csc -static program.o matchable.o -o program
$ file program
program: ELF 64-bit LSB  executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=7678fccb131e4f92a0ef199f985b110079df7dbf, not stripped
$ ./program
Hooray!

So far, little has changed from the manual’s example of compiling multiple files. The important differences are:

  1. The addition of the -unit and -uses flags.
  2. The addition of the -emit-import-library flag.

The -unit and -uses flags enforce the compilation model described in the manual as CHICKEN’s basic mode of operation, despite the fact that neither matchable.scm nor program.scm declare their units as described in that section2.

The -emit-import-library flag produces one additional file from matchable.scm besides its object. This file, matchable.import.scm, contains compile-time information about the module that’s used by csc when incorporating it into another program3.

Now, do the same for defstruct.

$ cat program.scm; echo
(use matchable defstruct)
(defstruct point x y)
(match (make-point x: 1 y: 2)
  (($ point 1 2)
   (display "Yippee!")
   (newline)))

$ chicken-install -r defstruct > /dev/null
$ csc -unit defstruct -cJ defstruct/defstruct.scm -o defstruct.o
$ csc -uses matchable,defstruct -c program.scm -o program.o
$ csc -static program.o defstruct.o matchable.o -o program
$ file program
program: ELF 64-bit LSB  executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=4cd22ac0bd74ba3721fbfb6c4dc8dc2eb5e4a170, not stripped
$ ./program
Yippee!

The binary now includes both matchable and defstruct4.

You can see where this is going. For each extension we use, and for each extension any of those extensions use (and so on, recursively – this is Scheme, after all), build an object with -unit and declare it as a dependency of every file that uses it with -uses.

One last time, with uri-generic (remember, that’s why we started down this rabbit hole in the first place). Here, uri-generic uses matchable and defstruct, while our program uses only uri-generic.

$ cat program.scm; echo
(use uri-generic)
(write (uri-path (uri-reference "http://example.com/super/happy/fun/time")))
(newline)

$ chicken-install -r uri-generic > /dev/null
$ csc -unit uri-generic -uses matchable,defstruct -cJ uri-generic/uri-generic.scm -o uri-generic.o
$ csc -uses uri-generic -c program.scm -o program.o
$ csc -static program.o uri-generic.o matchable.o defstruct.o -o program
$ ./program
(/ "super" "happy" "fun" "time")

We now have a static binary containing all of the required extensions.

Note that, when it comes time to link our program, an object for every extension used must be included in the resulting binary. This is tedious, but, as objects, the extensions can be bundled into archives, making them easier to manage.

$ ar rc uri-generic.a uri-generic.o matchable.o defstruct.o
$ rm *.o
$ csc -uses uri-generic -static program.scm uri-generic.a
$ ./program
(/ "super" "happy" "fun" "time")

There you have it. It’s a bit of work, but the same can be done for most extensions, and the ability to produce a self-contained program that makes use of the many eggs in the coop is nice when needed.


  1. The setup and meta files in particular are used by CHICKEN’s package manager.

  2. In fact, the -unit and -uses flags are equivalent to inserting those declarations manually. Again from the manual:

    -unit NAME
        Compile this file as a library unit.
        Equivalent to -prelude "(declare (unit NAME))"
    -uses NAME
        Use definitions from the library unit NAME.
        This is equivalent to -prelude "(declare (uses NAME))".
  3. Indeed, because CHICKEN’s module system is entirely syntactic, this file can be discarded once our program has been compiled. Import libraries are described in more detail here.

  4. Truthfully, a closer look at matchable and defstruct will show that they’re both purely-syntactic extensions, so there’s actually no need to compile them into objects at all. Because macros are included in a module’s import library, it’s enough to produce matchable.import.scm and defstruct.import.scm and have those files available (i.e. in the current directory) when compiling program.scm. Admittedly, this makes them bad examples in particular, but this article gives the more general approach.

2013-02-17 Falsity in JavaScript

I never remember these, because they make no sense. Perhaps writing them down will help.

> !! 0
false
> !! ""
false
> !! {}
true
> !! []
true
> !! NaN
false
> !! null
false
> !! undefined
false

If you really want to feel rage, try comparing some of these with ==, which is mostly useless and entirely dangerous. For example:

> '' == 0
true
> 0 == '0'
true
> '' == '0'
false

What, you thought equality was transitive? Ha, don’t be silly.

Because of things like this, I think there’d be value in a “CoffeeScript lite” that rewrites the absurd parts of JavaScript (such as replacing all instances of == with ===, as CoffeeScript does) without implementing an entirely new syntax.

2013-01-20 Simple dtach session manager

One of the most useful scripts in my bin is dt. It manages named dtach(1) fifos under $HOME/.dtach.

#!/bin/sh

name=$1
dir=$HOME/.dtach

usage() {
  echo "usage: $(basename "$0") [-n name] cmd"
}

if [ $# = 0 ]; then
  ls -1 "$dir"
else
  mkdir -p "$dir"
  while getopts hn: arg; do
    case $arg in
      ?) usage>&2; exit 1;;
      n) name=$OPTARG; shift 2;;
      h) usage; exit;;
    esac
  done
  DTACH="$(date +%s)-$name" \
    dtach -A "$dir/$name" -r winch "$@"
fi

Called without arguments, it lists running sessions. Called with a command line, it either resumes the session with the name of the program to be run, or starts a new one if none exists. An alternate session name can be given with -n.

I use this command a lot:

$ fc -ln 0 | wc -l
9367
$ fc -ln 0 | grep -c '^ dt[ $]'
1264

2014-06-04

abduco(1), by Marc André Tanner of brain-dump.org (author of dvtm(1)), provides all of these features and obsoletes dtach with a much cleaner program.

2013-01-18 Editing Scheme with Vim

For other weirdos like me who refuse to use Emacs.

Indentation

Vim’s built-in Lisp indentation (“set lisp”) is hardcoded to shift inner forms by two spaces. I much prefer the indentation style used by the various emacsen, where identifiers not in lispwords are vertically aligned; this patch against Vim 7.3 emulates this style.

Alternatively, see my schematic-format utility.

Lispwords

Getting lispwords right eases a lot of manual-indentation pain. Most important is…

au FileType scheme setl lispwords-=if

… To vertically align the arms of if expressions with their tests.

Syntax

Vim colors parentheses as Delimiters by default, which is too bright for my taste in most colorschemes. Additionally, I like to have a visual indication of which parentheses are parts of quoted forms, as opposed to normal code. This patch colors “normal” parentheses as Comments (which are light grey in my colorscheme), while parentheses in quote and quasiquote forms remain colored as Delimiters (red, with inner unquotes returning parentheses to the lighter Comment color).

REPL

I’ve written about simple REPL integration here.

Plugins

Surround

If you use vim-surround, the following line will add a binding to the s key that surrounds forms in a function application, so that e.g. yssscons wraps the current line in (cons ...).

let g:surround_115 = "(\1function: \1 \r)"

Smart Input

vim-smartinput is the only autoclose plugin I’ve found that acts like I expect, doesn’t interfere with other plugins, and doesn’t fuck up my undo history.

It also works well for Lisp, with one modification: the following lines in $HOME/.vimrc will prevent it from autoclosing quasiquotes for Lisp and Scheme:

call smartinput#define_rule({
      \ 'at': '\%#',
      \ 'char': "`",
      \ 'input': "`",
      \ 'filetype': ['lisp', 'scheme']
      \ })

Chicken

Documentation

Setting keywordprg gives you access to chicken-doc’s documentation for a given identifier when K is pressed over it.

setl keywordprg=chicken-doc

Completion

Vim

The following script will dump function names from chicken-doc into files named by node under $HOME/.vim/completion/scheme.

#!/bin/sh

set -e

dir="$HOME/.vim/completion/scheme"

first() { cut -d' ' -f1; }

mkdir -p $dir
for n in $(chicken-doc -c chicken | first); do
  chicken-doc -c chicken $n | first | tee $dir/chicken-$n
done

for n in $(chicken-doc -c | first); do
  [ "$n" = "chicken" ] && continue
  chicken-doc -c $n | first | tee $dir/$n
done

These files can be sourced to add completion for Chicken identifiers by adding the following to $HOME/.vim/ftplugin/scheme.vim:

setl complete+=d,k~/.vim/completion/scheme/**

Obviously, the same could be done for any other language, given a way to generate its wordfile(s).

csi

The same wordfiles can be used for tab completion in csi with rlwrap.

cat $HOME/.vim/completion/scheme/* > $HOME/.csi-words
alias csi="rlwrap -b'(){}[]\";' -q'\"' -f$HOME/.csi-words csi"

2013-01-06 An inotify gotcha

To recursively watch a directory, one typically monitors IN_CREATE events and, if the subject is a directory, inotify_add_watches it as well.

However, this introduces the potential for what are essentially TOCTTOU bugs. By the time you’ve inotify_add_watched a newly-created directory, it may have already been the subject of events about which you’re interested. To correct for this, if one wants to monitor file creation events, for example, one must add the watch and immediately walk the directory to act on any files created before the watch was registered – potentially recursively – before returning to handle new events resulting from the watch.

2012-12-01 Python scoping rules

(Hint: it doesn’t)

Python’s an OK language, but its scoping rules are pretty broken.

>>> foo = 1
>>> def incfoo():
...     foo += 1
...     return foo
...
>>> incfoo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in incfoo
UnboundLocalError: local variable 'foo' referenced before assignment

I’ve been told the “correct” way to do this in Python is using containers…

>>> foo = [1]
>>> def incfoo():
...     foo[0] += 1
...     return foo[0]
...
>>> incfoo()
2

… Which is insane.

Even more sinister is the way list comprehensions brutally pollute their surrounding scope:

>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> for x in [1, 2, 3]:
...     x + 1
...
2
3
4
>>> x
3
>>> [x + 1 for x in [4, 5, 6]]
[5, 6, 7]
>>> x
6

This has been fixed in Python 3, thankfully. For 2.x releases, you can work around it with map (for the latter case, anyway).

2012-08-05 Poor man's SLIME

I’m not an Emacs user and I don’t care for slimv, but the ability to send text from $EDITOR to a running REPL a la SLIME is invaluable. For a while I used a modified version of tslime.vim, but I’ve since ditched it for something much simpler.

In sh(1):

mkfifo repl
cat > repl & PROGRAM < repl

In vim(1):

nmap <C-c><C-c> vap:w >> repl<CR>

That’s it. ^C^C sends the toplevel expression under the cursor to the cmd reading from repl, no plugins or external tools are needed, and concerns are separated correctly: the editor and repl are isolated by pty and the view is managed by the window manager, as it should be.

2012-04-08 Tricksy byte to ASCII hex value conversions


short n = 10;
n["0123456789ABCDEF"]; /* => 'A' */

Too cute by half.

2012-03-18 Three ways to hook a library function in C

Say one wanted to modify malloc such that it punted on allocations larger than 1024 bytes.

At runtime, by library interposition using LD_PRELOAD. This is the most common method. With the following as hook.c:

/*
 * gcc -shared -fPIC -D_GNU_SOURCE -ldl -o hook.so hook.c
 * LD_PRELOAD=./hook.so ./program
 */

#include <dlfcn.h>
#include <stdio.h>

void *malloc(ssize_t size) {
  typeof(malloc) *orig_malloc;

  if (size > 1024) {
    fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
    return NULL;
  }

  orig_malloc = dlsym(RTLD_NEXT, "malloc");
  return orig_malloc(size);
}

At compile time by macro-redefinition of malloc:

/*
 * echo '#include "hook.c"' | cat - program.c > hooked.c
 * gcc -o program hooked.c
 * ./program
 */

#include <stdio.h>
#include <stdlib.h>

void *hook_malloc(ssize_t size) {
  if (size > 1024) {
    fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
    return NULL;
  }
  return malloc(size);
}

#define malloc(n) hook_malloc(n)

At link time using the linker’s --wrap feature, which rewrites malloc to __wrap_malloc and __real_malloc to malloc:

/*
 * gcc -Wl,--wrap,malloc -o program hook.c program.c
 * ./program
 */

#include <stdio.h>
#include <stdlib.h>

typeof(malloc) __real_malloc;

void *__wrap_malloc(ssize_t size) {
  if (size > 1024) {
    fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
    return NULL;
  }
  return __real_malloc(size);
}

2012-03-17 Heroku buildpack for CHICKEN Scheme

Last week I put together a Buildpack for deploying CHICKEN Scheme apps on Heroku’s Cedar stack.

It comes with CHICKEN 4.7.0 and uses the egg packaging infrastructure to manage dependencies.

The code is available at Github; instructions and examples can be found in the README.

2012-02-28 Multivariable negativity test

To test multiple signed values for negativity all at once, one can use…

if ((a|b|c) < 0)

… Which might be marginally more clear than writing out each comparison individually, kind of, maybe?

This also works (i.e. “reads logically”) with &.

2012-02-24 My favorite line of Ruby

… that I’ve ever written:

begin return yield ensure unlock end if block and lock

Totally nonsensical and perfectly valid, like a Ruby version of Perl line noise. For context, it’s from a tiny lockfile class.

class Lockfile
  def initialize(file)
    @file = file
  end
  def locked?
    File.exists?(@file)
  end
  def lock(&block)
    begin return yield ensure unlock end if block and lock
    not locked? and File.open(@file, "w") { true } rescue false
  end
  def unlock
    File.delete(@file) and true rescue false
  end
end

2012-02-20 Swapping values without a temporary variable


a ^= b ^= a ^= b;

Note that, because C’s evaluation order is undefined, the result of this expression is also technically undefined (though it can be rewritten as three equivalent statements with well-defined results).

2012-02-19 First impressions of Clojure

Some stream-of-consciousness thoughts after a little while using Clojure.

Firstly, it’s far more frustrating to know that a function exists but have no idea what it’s called or where it’s located than to have no idea what you’re looking for in the first place. As a Schemer, I’m constantly wanting to type member, find, fold, &c, but finding the equivalents in the Clojure documentation takes more time than I’d like (i.e. greater than 2-3 seconds). This isn’t Clojure’s fault by any means, just irritating. clojure.repl’s apropos helps with this, but from what I can tell it doesn’t reach outside the current namespace to find functions.

The most surface-level observation is the heavy use of reader macros. There’s far too much syntax for me; when reading (what I assume to be) idiomatic Clojure in the standard library, I often feel as though I’m drowning in it. Some forms are admittedly quite useful (the literal map syntax is one I like), but some (e.g. #(), which is basically just cut) seem unnecessary and detrimental to readability.

Speaking of readability, I find the literal sequential syntax ([]) very hard to work around. Admittedly, I am of the One True Parenthese (or is that Two?) in that I find square brackets a hindrance even when they’re equivalent to parentheses, but with the difference in behavior that Clojure introduces, keeping them all wrangled becomes important and thus annoying. I also find them visually difficult to distinguish – were I to write lots of Clojure, I’d probably look for a font with significant differences between the two (sigh, four) characters.

The uniform interface accross the built-in sequence types is very nice. That said, nine times out of ten it’s a list; this either begs the question of the other types’ necessity, or it means I’m not yet using Clojure like it’s meant to be used. The ability to apply datatypes as procedures is nice, and leads to some interesting idioms. I’d be interested to know something of the implementation details of this feature, and what kinds of performance costs are involved.

The metadata stuff is all a bit weird, like property lists on crack. I’m having trouble coming up with any good uses for it that I wouldn’t rather do another way, but I’m having trouble coming up with uses for it at all so maybe time will tell if it’s a useful feature. It also seems like the kind of thing that will be abused over time, like hash arguments in Ruby; it will be interesting to see if the Clojure community reaches some consensus regarding the feature and finds a good role for it (or maybe there already is one, I don’t know it).

I haven’t had a reason to use the any of the concurrency features yet, but they look well thought-out.

Shame about the unhygienic macros.

The Java interop is probably as good as can be expected. The stack traces are useless, but the ability to develop incrementally at a REPL offsets this. The most frustrating issue in this territory is that object methods aren’t first class, so otherwise concise code becomes necessarily more complex as it slows down and passes around Java objects to circumvent this limitation.

Even with the decent integration, however, the dependence on the JVM is really hard to swallow. It’s big and slow to start, and I simply don’t want a JDK on my system. Maybe I’ll go check out Arc – at least I already have Racket installed.

2012-02-04 Pointer tagging

Pointer tagging is a very old but still very clever technique for packing more information into a pointer than just a destination register address.

It works like this: Pointers to heap data are often 4- or 8-byte-aligned, which means that the lowest couple of bits in any given pointer will be 0s. How many bits you’re guaranteed to have depends on your object layout and architecture; for allocated data on an x86-64 machine it’s three. So, instead of letting those low bits sit around unused, you can encode some information about the data pointed to directly in the pointer. Many programming languages add the type of the referenced object, for example, in order to branch earlier on its type or save space in its representation. So, if you originally had an object layout like this:

struct object {
  unsigned type;
  unsigned size;
  void *data;
};

… You could do away with the type slot for seven of your object types (hopefully the seven most common) by moving the type value into the pointers to those objects. Then, just check those first bits and mask them out before making the jump.

#define TYPE_ONE 1
#define TYPE_TWO 2

#define TYPE(p)  ((int)(p) &  7)
#define VALUE(p) ((int)(p) & ~7)

switch(TYPE(p)) {
  case TYPE_ONE:
    handle_type_one(VALUE(p));
    break;
  case TYPE_TWO:
    handle_type_two(VALUE(p));
    break;
  ...
}

This can be taken a step further. If we left-shift integer values and mark their types in those newly-freed low bits as well, we can quickly check for an integer-type object and, when one is found, perform arithmetic directly on the bits of the pointer after right-shifting its value back into place. This allows us to treat numerical objects just like any other while getting the speed boost of nearly-immediate integer operations.

#define IS_INTEGER(p) (((int)(p) & 7) == TYPE_INTEGER)

if(IS_INTEGER(p))
  value = (p >> 3);

There’s an obvious choice to be made here – defining integers by tag 000 will result in slightly faster numerical code, while defining tag 000 to be a standard object pointer will give you fewer bitshifts when tracking down objects on the heap; it all depends on the case you want to optimize.

Since learning this trick, I’ve found that just about every programming language uses it to some degree. To my knowledge, all of SML, Smalltalk, Haskell and Ruby (as well as a dozen or so Lisp and Scheme implementations) have used it, or something like it, at one time or another.

There’s a nice, easy-reading paper discussing this technique and many more like it here.

2012-01-11 Allocating pathname buffers

From what I can tell, there’s no easy, portable way to determine a sensible length for a pathname buffer (and the various nonportable APIs for doing so are a clusterfuck).

ISO C defines a macro, FILENAME_MAX, that seems pretty reasonable at first glance:

FILENAME_MAX expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened;

However, it also includes this discouraging footnote:

If the implementation imposes no practical limit on the length of file name strings, the value of FILENAME_MAX should instead be the recommended size of an array intended to hold a file name string. Of course, file name string contents are subject to other system-specific constraints; therefore all possible strings of length FILENAME_MAX cannot be expected to be opened successfully.

This makes FILENAME_MAX sound fairly useless. And that’s exactly the case for GNU libc, according to its manual:

Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on the GNU system.

Usage Note: Don’t use FILENAME_MAX as the size of an array in which to store a file name! You can’t possibly make an array that big! Use dynamic allocation (see section Memory Allocation) instead.

So FILENAME_MAX is pretty useless, especially if GNU/Linux is a target.

There’s also PATH_MAX, but it’s not even guaranteed to be defined, and may be indeterminate when queried via pathconf(3). Moreover, the POSIX standard warns:

Some values, such as PATH_MAX, are sometimes so large that they must not be used to, say, allocate arrays.

So PATH_MAX has the same problem as FILENAME_MAX.

Then there’s MAXPATHLEN (which AFAICT isn’t guaranteed anywhere to be defined or have any bearing on reality) and _POSIX_PATH_MAX (which is just the minimum POSIX-allowed limit), neither of which help, either.

This is frustrating, though the difficulty of retrieving a single value for this purpose makes some sense (since different filesystems impose different upper limits on pathname lenths, which entails a runtime check). There may be an easy way to do this, but if there is I can’t find it.

2011-11-15 On Cons and Icons

When discussing the merits of Lisp, one of the first things people tend to mention is macros. And it’s true, macros are a powerful feature, so it’s easy to stop there and leave happy. However, what’s really important about Lisp isn’t its macro expansion phase – many other languages provide something similar – but its simplicity and homoiconicity1. The saying is that Lisp’s code is data and its data is code; a powerful macro system is a direct consequence of that coequality and just one of many examples of the power it affords. It’s telling that John McCarthy’s first question, when asked about potential similarities between Ruby and Lisp, wasn’t “does Ruby have macros?” or even “does Ruby have first-class functions?”, but rather “does Ruby use list structures as data?”, referring to the double-duty performed by lists in his list processing language.

Consider the following, from Darius Bacon’s A Hacker’s Introduction to Partial Evaluation.

(define (emit-square x) `(* ,x ,x))

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((= n 1) x)
        ((odd? n) `(* ,x ,(emit-power x (- n 1))))
        (else (emit-square (emit-power x (/ n 2))))))

; (emit-power 'x 5) => (* x (* (* x x) (* x x)))

What we have here isn’t just a macro but a compiler, one that generates exponentiation expressions. And – because Lisp’s fundamental compound data type is precisely that of its syntax tree – it’s turtles Lisp all the way down (if “it is better to have 100 functions operate on one data structure than 10 functions on 10 data structures” and you’re going to design around just one, cons cells are a good choice).

While this technique of compilation by emitting a syntax tree isn’t unique to Lisp (the original C++ compiler emitted C, for example), the ease with which it can be done might be2. “Languages differ not so much in what they make possible, but in what they make easy”; Lisp makes self-reference easy, and because the referents, lists and symbols, are such versatile objects, it’s possible to encode a wide variety of ideas in a simple, common tongue.

A wonderful example of such an encoding is given in Shriram Krishnamurthi’s Automata via Macros, where he uses lists to describe a machine accepting the language c[ad]*r and a program to run it:

(define machine
  '((init (c more))
    (more (a more)
          (d more)
          (r end))
    (end)))

(define (run machine init-state stream)
  (define (walker state stream)
    (cond
      ((null? stream) #t)
      (else
       (let ((in (car stream))
             (transitions (cdr (assv state machine))))
         (let ((new-state (assv in transitions)))
           (if new-state
               (walker (first (cdr new-state)) (cdr stream))
               #f))))))
  (walker init-state stream))

; (run machine 'init '(c a d a d d r)) => #t
; (run machine 'init '(c a d a d d r r)) => #f

With Lisp’s orthogonality, one gets a flexible DSL language entirely for free (all at runtime and no macros required). Here, it’s a language for describing state machines, but similar techniques have been used to great effect for everything from defining regular expressions to building XML to querying SQL databases. Even when creating sublanguages, strong axioms remove the need for string-munging and ad-hoc parsers; one can simply (write (eval (read))).

It’s interesting to note that Lisp’s shared representation of code and data is somewhat of an historical accident. As originally conceived, the language would make use of S-expressions internally, but manually-prepared programs would instead use a notation similar to that of FORTRAN (referred to by McCarthy as “M-expressions” for “meta-language”). Translation between the two grammars was planned but never implemented, however, as programmers took a liking to the simplicity of Lisp’s parenthetical syntax3.

It was a fortunate lapse. The flexibility of S-expressions have helped Lisp develop a culture of linguistic experimentation, one from which a powerful macro system was a natural but significant development; the ability to absorb and express a wide variety of paradigms has inspired Lisp’s description as a “language laboratory” (adapted from Steele & Gabriel’s The Evolution of Lisp, wherein Steele recalls using it to implement some ten interpreters a week):

(defmacro switch (value &rest body)
  (let* ((newbody (mapcar #'(lambda (clause)
                              `(,(gensym) ,@(cdr clause)))
                          body))
         (switcher (mapcar #'(lambda (clause newclause)
                               `(,(cadr clause) (go ,(car newclause))))
                           body newbody)))
    `(block switch
       (tagbody (case ,value ,@switcher)
                (break)
                ,@(apply #'nconc newbody)))))

(defmacro break ()
  '(return-from switch))

(switch n
  (case 0 (princ "none") (break))
  (case 1 (princ "one "))
  (case 2 (princ "too "))
  (case 3 (princ "many")))

Thus, to write code in a simply-homoiconic language is to do two things at once: to create a computer program and to create a series of data structures. That one can manipulate the other is the magic of it all. As a final illustration, here’s a Scheme program implementing a key-value data store that rewrites itself to include newly-inserted values (a literal interpretation of the phrase “self-modifying code”):

(define data '()) ; Initial data.

(let ((self (car (command-line)))
      (args (cdr (command-line))))
  (case (string->symbol (car args))
    ((get)
     (let ((pair (assoc (cadr args) data)))
       (when pair ; Print associated value.
         (display (cdr pair))
         (newline))))
    ((set)
     (let ((program ; Read own program.
            (with-input-from-file self
              (lambda () (read) (read)))))
       (with-output-to-file self
         (lambda ()
           (write ; Insert new value.
             `(define data
                '(,(cons (cadr args) (caddr args))
                  ,@data)))
           (write program)))))))

$ scheme data.scm get a
$ scheme data.scm set a 1
$ scheme data.scm set b 2
$ scheme data.scm get a
1
$ scheme data.scm get b
2

At this point, my Scheme bias is probably showing. If you have any examples of effective (or even just amusing) uses of homoiconicity in Scheme, Lisp or any other language, please send them my way.


  1. Both are important. If you wanted to be pedantic, you could argue that “well, programs are written as characters, so if a language can operate on strings then it’s homoiconic too”. That’s technically true, but don’t stop there: strings can be encoded as lists of numbers, so any language that can operate on numerical expressions is also homoiconic. Then, you can Gödel-encode those expressions and say that any Turing-complete language is homoiconic. That’s not the point – the point is that Lisp’s simplicity makes the manipulations above not just possible but trivial.

  2. There are some modern languages, such as Io and REBOL, that do very interesting things involving introspection and self-modification. Lisp, however, predates each, and has certainly served as the foundation for such new ideas.

  3. There have actually been many attempts to remove the parentheses, most of which have floundered. Notable as fairly direct Lisp descendants without its S-expressive syntax are Logo and Dylan.

2011-10-01 Anonymous functions in C99

This post on comp.lang.c describes a preprocessor macro to add convenient support for anonymous functions to C:

#define function(return_type, body_and_args) \
  ({ \
    return_type __fn__ body_and_args \
    __fn__; \
  })

While not technically a lambda (and requiring gcc extensions), it does allow for some fun constructs. One can use it to express forms from other languages that naturally take inline functions, such as Lisp’s with-open-file or Ruby’s each_line:

void with_open_file(const char *filename, void(* fn)(FILE *)) {
  FILE *f;
  if ((f = fopen(filename, "r")) == NULL) {
    perror("Cannot open source file");
    exit(1);
  }
  (* fn)(f);
  fclose(f);
}

void each_line(FILE *f, void(* fn)(char *)) {
  char l[8192];
  while (fgets(l, sizeof(l), f))
    (* fn)(l);
}

int main(int argc, char **argv) {
  with_open_file(argv[1], function(void, (FILE *f) {
    int i = 0;
    each_line(f, function(void, (char *l) {
      printf("%i %s", ++i, l);
    }));
  }));
  return 0;
}

It’s easy to forget some of the cool things C macros can do. As an example, the V7 Unix shell (what would eventually become bash) hardly looks like C in some places.

2011-10-01 Monadic meditations

Half-baked monad1 tutorials are currently in vogue, so here’s mine. This is more a retracing of how I began to grok the fullness than a proper tutorial, but it moves slowly so some may find it helpful.


It isn’t difficult to understand monads on a practical level given the right explanation. For me, that was the following:

Monads provide a functional way to build a specialized evaluation environment, where some implicit behavior is threaded through a series of other computations.

That’s still somewhat abstract, so let’s look at a contrived example. Imagine the series of actions necessary to mail a postcard. A program to perform these actions in an imperative style might look like this:

(define (send postcard)
  (set! postcard (write-message postcard))
  (set! postcard (address postcard))
  (set! postcard (stamp postcard))
  (mail postcard))

Now, because each step modifies the state of the given postcard, order of operations comes into play. In an eager language, this program behaves as expected, since it’s evaluated from top to bottom. However, in a lazy language like Haskell where there’s no such guarantee of evaluation order, the postcard might be mailed before it’s written, addressed, or stamped. In such a case, one must explicitly sequence the operations in order to ensure the postcard is sent correctly.

Finding Your Identity

To do this, one could rewrite the program without mutation as a series of lets, as follows:

(define (send postcard)
  (let ((written (write-message postcard)))
    (let ((addressed (address written)))
      (let ((stamped (stamp addressed)))
        (mail stamped)))))

Or, written more succinctly as a let*:

(define (send postcard)
  (let* ((written   (write-message postcard))
         (addressed (address written))
         (stamped   (stamp addressed)))
    (mail stamped)))

And you know what? let* is, essentially, a monad2. Indeed, it is the simplest monad, the identity monad. To see this, first take a look at the two functions that comprise the identity monad:

(define (bind value function)
  (function value))

(define (result value) value)

Any monad can be thought of as a chain of two functions, bind and result. In the case of the identity monad, these are trivial: bind applies a function to a value, and result simply returns its argument. How do these simple operations combine to do anything useful? Well, with the identity monad’s two procedures, one can bind a value to a name in the following way:

(bind 2 (lambda (x) (result (+ 1 x))))

The above is equivalent to:

((lambda (x) (+ 1 x)) 2)

Which itself is equivalent to:

(let ((x 2))
  (+ x 1))

Indeed, a simple Lisp implementation might define let in precisely this way, as an inversion of the identity monad’s operations. In fact, using these basic building blocks, a monadic let* of arbitrarily many terms can be expressed by a macro like the following:

(define-syntax monad-let*
  (syntax-rules ()
    ((_ () <body> ...)
     (result (begin <body> ...)))
    ((_ ((<term1> <expr1>) (<term2> <expr2>) ...) <body> ...)
     (bind <expr1>
           (lambda (<term1>)
             (monad-let* ((<term2> <expr2>) ...)
               <body> ...))))))

Given this form, the following expansion occurs:

(monad-let* ((a 1)
             (b 2)
             (c (+ a b)))
  (+ a b c))

… Into…

(bind 1 (lambda (a)
          (bind 2 (lambda (b)
                    (bind (+ a b) (lambda (c)
                                    (result (begin (+ a b c)))))))))

… Which correctly evaluates to 6. So, our monad-let* form binds names to computation results, one by one, before finally returning a single value calculated with those names. This is the evaluation environment defined by the identity monad – one of explicit ordering.

If these manipulations seem simple, that’s because, well, they are. But defining such operations as bind and result at a basic level allows for the reliable composition of more complex evaluation environments.

A State of Enlightenment

Returning to our postcard example, we now have the following:

(define (send postcard)
  (monad-let* ((written   (write-message postcard))
               (addressed (address-postcard written))
               (stamped   (stamp addressed)))
    (mail stamped)))

You’ll notice that all of these functions act very much in the same way: take a postcard, return a modified postcard. Each just moves the object it’s given from one state to another. Recognizing this, one might like to specify an evaluation environment where these state changes are implicit and the syntactic burden of passing state from one function to the next lies elsewhere. The state monad, whose bind and result operations are defined below, does just this.

(define (bind value function)
  (lambda (state)
    (let-values (((value* state*) (value state)))
      ((function value*) state*))))

(define (result value)
  (lambda (state)
    (values value state)))

This monad’s operations return unary procedures. You can destructure them simply enough; essentially, bind’s result takes a state and applies bind’s original arguments in turn as functions, while result just curries away its argument, which its resulting procedure returns along with a given state. What’s important to note is that, when these operations are chained in a monadic composition (such as our monad-let* form), there is now a new value, a state, being threaded through it all.

This can be seen with another simple call to monad-let*, this time using the state monad’s bind and result operations. Instead of a value, a procedure is now returned that, when applied to an initial value, threads it through the sequence of computations implicitly3.

(define add-two
  (monad-let* ((a (result 0))
               (b (result 2))
               (value (result (+ a b))))
    value))

(add-two 5)     ; => 2, 5
(add-two "ten") ; => 2, "ten"

Hm, clearly the initial value is just being passed along and returned. In order to actually manipulate the state, we need two simple functions, get and put (along with a monadic add to make things clearer):

(define (get)
  (lambda (state)
    (values state state)))

(define (put value)
  (lambda (state)
    (values value value)))

(define (add . lst)
  (result (apply + lst)))

Now, add-two actually does:

(define add-two
  (monad-let* ((init   (get))
               (value  (add init 2))
               (result (put value)))
    init))

(add-two 5) ; => 5, 7

A redefinition of our postcard-sending routine using the state monad might look something like the following, where each function is defined to act on a state and return a monadic value:

(define send
  (monad-let* ((_      (write))
               (_      (address))
               (_      (stamp))
               (result (mail)))
    result))

(send postcard) ; => success!

A more real-world example of this pattern might be I/O. In fact, Haskell’s I/O monad is a special case of the state monad, where the state in question is an open file. The abstractions are similar:

(define (open)
  (lambda (state)
    (let ((new (open-input-file state)))
      (values new new))))

(define (close)
  (lambda (state)
    (values (void) (close-input-port state))))

(define (rewind)
  (lambda (state)
    (set! (file-position state) 0)
    (values (void) state)))

(define (get-line)
  (lambda (state)
    (let ((line (read-line state)))
      (if (eof-object? line)
          (values "Mu." state)
          (values line state)))))

(define achieve-enlightenment
  (monad-let* ((_       (open))
               (before  (get-line))
               (_       (rewind))
               (after   (get-line))
               (then?   (get-line))
               (_       (close)))
    (print (format "Before enlightenment? ~s" before))
    (print (format "After enlightenment? ~s" after))
    (print (format "And then? ~s" then?))))

(achieve-enlightenment "zen.txt")

; Before enlightenment? "Chop wood, carry water."
; After enlightenment? "Chop wood, carry water."
; And then? "Mu."

Why? Why Not?

Hopefully by now you see what I mean by an evaluation environment. In each of these cases, some implicit behavior accompanied the computation, creating a mini-language to meet a specific requirement. And, by their nature, monads can be composed and combined, allowing for powerful pipelined expressions.

So, while monads might seem to impose a silly level of abstraction on such trivial examples, and while it is true that monads tend to make more sense in lazy languages (when a specific evaluation order is desired), pure languages (when an imperative style is needed), strongly-typed languages (where their correctness can be statically verified), or languages where similar abstractions aren’t provided (a Schemer, for instance, might use macros or parameter objects where state is concerned), the pattern they provide might be useful from time to time. And besides, learning new solutions (even to old problems) is always a good thing.


  1. “Half-baked monads” is an excellent band name.

  2. This isn’t really true. This article glosses heavily over the technicalities of a rigorous definition of monads, such as type requirements and associativity rules. You can find these elsewhere.

  3. result must be used here to provide monadic values from the primitives 0 and 2. A more specialized macro than monad-let* would alleviate the need.

2011-10-01 glut without glutmainloop

Want to use GLUT for window creation but don’t want to hand over your whole main loop? You have three options:

2011-10-01 Dynamically modifying methods in Ruby

(Or, Navigating Nil Without the Tap Dancing).

I recently wanted to limit the range of return values for a few of Ruby’s built-in Array methods. Luckily, the language provides a set of (somewhat obscure) methods to achieve just this.

Aside: Method Chaining in Ruby

Array’s “bang” methods (compact!, uniq!, etc.), which modify an array in place, all return nil when no changes are made to the array. This is occasionally annoying, as it causes the following to raise exception if any method fails to mutate its callee:

[a, r, r, a, y].compact!.uniq!.each { |e| puts e }

What we’d like is for these methods to return self unconditionally. Others have come up with solutions to this problem (like this one, using Object#tap), but if you don’t like the methods the way they are (and you don’t need to maintain the default behavior for anyone else), why not just change them?

Back to Modifying Methods

In order to do this, we could override each method one-by-one, but that’s a pain. Instead, we need a way to append a small amount of code to each of these methods without losing their original definitions in the process:

class DeviantArray < Array
  Array.instance_methods(false).grep(/!/).each do |m|
    orig = instance_method(m)
    define_method(m) do |*args|
      orig.bind(self).call(*args)
      self
    end
  end
end

The first bit,

Array.instance_methods(false).grep(/!/).each do |m|

grabs the subset of Array’s instance methods with a “!” in the name. Inside the loop, things get a little less clear:

orig = instance_method(m)

Module#instance_method, according to ruby-doc, “Returns an UnboundMethod representing the given instance method.” More plainly, it snaps off a Module’s instance method, disassociating it from its object and letting us pass it around without having to refer to it by name. This is useful, since it allows us to hold onto the contents of a method even after redefining a new one under its old name.

The next block performs this redefinition:

define_method(m) do |*args|
  orig.bind(self).call(*args)
  self
end

define_method does just that. The given block is the new method content; inside, our detached method is rebound to the calling object, invoked, and self is returned.

Now, we can do the following without worrying about nil values:

DeviantArray.new([a, r, r, a, y]).flatten!.compact!.reverse!.shuffle!.uniq!.sort!

A Simpler Example

This technique isn’t only useful for changing return values – one might also use it to add a hook to a set of methods. Here’s a trivial example:

class String
  [:chomp, :delete, :gsub, :squeeze, :strip, :sub].each do |m|
    orig = instance_method(m)
    define_method(m) do |*args|
      before = size
      after = orig.bind(self).call(*args)
      puts "Removed #{before - after.size} characters."
      after
    end
  end
end

Now, we get some useful feedback when we call the following:

>> "aabbcc".squeeze
Removed 3 characters.
=> "abc"

>> " abc ".strip
Removed 2 characters.
=> "abc"

>> "abcxyz".delete "xyz"
Removed 3 characters.
=> "abc"

You probably shouldn’t do this kind of thing in a large codebase with multiple authors, but if you have the freedom to tinker this can be a powerful tool.