Pepsi and Coke (No. 5) - squeakerのブログ

I fetched the "head" tree from the SVN repository of idst and tried to make it run on my Windows machine (again). I said "I tried" but the way I did was just to complain to Ian in person. The issue turned out that some macros (such as _B, _W, _L, and _X) defined and used by jolt name-crash with the cygwin gcc compiler's definitions. So, I just undef them. (That was what Ian told me to do). There was still a few more smaller issues, but I managed to compile jolt from the head.

The JavaScript implementation comes with it still doesn't work, but I think I should switch to this version anyway. So, things I wiil write on this blog are based on that version.

I use Meadow (Emacs) on Windows for doing stuff, but until I started playing with Coke, I was just happy with the Mingw32 binaries. I did have cygwin installed but rarely used for anything. In the light of Coke, I set up a bit better environment for the combination of Meadow, Mingw32, and Cygwin. A few small modification to the jolt code let me use jolt and idc under cmuscheme.el and a bash shell running in the shell-mode. (Ian and Alex don't run these interactive stuff under Emacs. That bothers me, and also the fact that they are productive without the support from these great tools bothers me, too.)

Anyway, now it's time to talk about defining new syntax and the Compiler. The syntax thing is somewhat similar to some macro systems in Lispy languages, and there is a section on it in http://www.piumarta.com/pepsi/coke.html. So, I'm not going into the detail, but just jump to an example.

For what I would like to do with Ruby, I need to handle a variable number of arguments for a method. However, the Coke at the language level doesn't support such var-args well. (The built-in "and", "or" and some other operators handle var-args, so it is clearly possible under that level. I'll talk about it later.)

To see if I can do such a thing, I tried to implement the "max" function. The function gets one or more arguments and returns the biggest value. Well, I made some (a lot of) unsatisfying attempts and ended up with asking Ian. One of my wrong attempts was like this:

(define [Object evalWith: compiler]
  [SmallInteger value_: [[compiler compile: self] value]])

(define max1
  (lambda (node compiler)
    (let ((s [node size])
	  (idx '0)
	  (m [[node at: idx] evalWith: compiler]))
      (set idx [idx + '1])
      (while [idx < s]
	     (let ((c [[node at: idx] evalWith: compiler]))
	       (if [m < c]
		   (set m c)))
	     (set idx [idx + '1]))
      m)))

(syntax max ; (max ...)
   (lambda (node compiler)
     (let ((n (max1 [node copyFrom: '1] compiler)))
       n)))

Here, the syntax of "max" doesn't do too much but just passes the arguments to a helper-function called "max1". The first argument passed to max1 is an Expression object (again, it is like an Array). The size of the arguments to "max1" is one fewer than to "max", as the first node is the Symbol "#max" itself. In "max1", the nodes (bound to "node") are evaluated in a brute force (and wrong) way with the compiler and compared to each other to get the answer. This version lets you do stuff like:

.(max '1 '2 '3 '4 11)
 => 11

or if you have a global variable:

.(define x '5)
.(max '1 '2 '3 '4 x)
 => 11

So you can mix the quoted and not-quoted numbers, and use varaibles. However, it doesn't work a case like:

.((lambda (y) (max '1 '2 '3 '4 y)) '5)
undefined: y

My assumption when I was writing this was that the "compiler" argument in the syntax definition knows about the static nature of the environment. In other words, syntax expansion is something that happens at compile time, but variable binding happens at runtime. In the last example, the Compiler that is expanding the "max" form is in the global environment (i.e., its "environment" instance variable is GlobalEnvironment). So, even if you try to evaluate "y" in the environment, it is not defined yet.

Ian's version that does the right thing is like this:

(define make-max
  (lambda (exprs)
    (let ((idx  '0)
          (lim  [exprs size])
          (body [WriteStream on: [Expression new: lim]]))
      (while (< idx lim)
	[body nextPut: `(if (< val (set tmp ,[exprs at: idx])) (set val tmp))]
        (set idx (+ idx 2)))
      [body contents])))

(syntax max
  (lambda (node compiler)
    `(let ((val ,[node second])
           (tmp 0))
        ,@(make-max [node copyFrom: '2])
        val)))

It also has got the syntax definition of "max" and a helper-function "make-max". The difference is that the helper-function, "make-max" returns an Expression (instead of the value) and "max" embeds the returned Expression into another enclosing Expression and returns it. To see what really happens, you can evaluate to display all the syntax expansion:

[Compiler parseOption: '"-vs"]

After evaluating, you can evaluate:

.(max '1 '2 '3)

and you get:

  SYNTAX  -> (#max (#quote 1) (#quote 2) (#quote 3))
  REWRITE => (#let ((#val (#quote 1)) (#tmp 0)) (#if (#< #val (#set #tmp (#quote 2))) (#set #val #tmp)) (#if (#< #val (#set #tmp (#quote 3))) (#set #val #tmp)) #val)
  BUILTIN -> (#quote 1)
  REWRITE => nil
  BUILTIN -> (#quote 2)
  REWRITE => nil
  BUILTIN -> (#set #tmp (#quote 2))
  REWRITE => nil
  BUILTIN -> (#< #val (#set #tmp (#quote 2)))
  REWRITE => nil
  BUILTIN -> (#set #val #tmp)
  REWRITE => nil
  BUILTIN -> (#if (#< #val (#set #tmp (#quote 2))) (#set #val #tmp))
  REWRITE => nil
  BUILTIN -> (#quote 3)
  REWRITE => nil
  BUILTIN -> (#set #tmp (#quote 3))
  REWRITE => nil
  BUILTIN -> (#< #val (#set #tmp (#quote 3)))
  REWRITE => nil
  BUILTIN -> (#set #val #tmp)
  REWRITE => nil
  BUILTIN -> (#if (#< #val (#set #tmp (#quote 3))) (#set #val #tmp))
  REWRITE => nil
  BUILTIN -> (#let ((#val (#quote 1)) (#tmp 0)) (#if (#< #val (#set #tmp (#quote 2))) (#set #val #tmp)) (#if (#< #val (#set #tmp (#quote 3))) (#set #val #tmp)) #val)
  REWRITE => nil
 => 7

The chain of "if" and assignment is generated at the compile time and it is evaluated right away so that you get 7 as the result. Experimenting a syntax definition at the interactive shell's top-level was confusing to me. The syntax expansion is a complie-time behavior, but it is also evaluated right away so that you might get (as I did) many wrong ideas. You might also feel that the execution would be inefficient, because a lot of rewriting has to be done when you type. But, this is only because you are in the interactive shell. If you define a function that uses "max", the expansion is done once, and resulting (fixed length of) "if" chain is embedded into the calling function just once. (I didn't have to use the Compiler explicitly.

Also, you have to be fully aware what a quoted number (like '3) is. When such a number (like (#quote 3)) is evaluated (after reading), an object of 3 is created. But if you try to deal with it at the syntax expansion time in some way, (as I did in my other attempts), you'll never get it right.

A fun mistake that probably you will do some number of times is that you get into an infinite loop of syntax expansion. If you define something like:

(syntax foo
   (lambda (node compiler)
     `(let ((first [,node first]))
        ...

The syntax expander embeds the original form including the leading "foo" in place of ",node" and re-evaluate it.

Another note is that Ian uses primitive value 2 as an increment to get the next SmallInteger very often. For example, there is a syntax of "for" defined, and a version of "vector-map" Ian suggested was basically like this:

(define vector-map
  (lambda (func list)
    (let ((result [list new: [list size]]))
      (for (index '0 2 [[list size]])
        [result at: index put: (func [list at: index])])
      result)))

Here, the initial value for the "index" variable is "'0", but the step is "2". It is indeed correct that "(== (+ '0 2) '1)" is true, but I can't help feeling it as a "bad know-how".

The Compiler compiles Expressions, and also numbers, strings, and symbols into an abstract stack machine instructions. The stack machine architecture is called "VPU". The predefined forms such as "while", "if", "and", and function calls have pre-defined (but as changeable as other things) translation rules. Let's take a look at the definitions of SmallInteger>>compile:, Compiler>>integer:, and Compiler>>form:,

SmallInteger compile: compiler	[ ^compiler integer:    self ]

Compiler integer: anInteger
[
    vpu ldInt: anInteger.
    ^nil
]
Compiler form: aForm
[
    | head syntax |
    aForm isEmpty ifTrue: [self syntaxError: aForm].
    head := aForm first.
    (syntax := environment lookupSyntax: head) ~~ nil	 ifTrue: [^self recompile: (self applySyntax: syntax with: aForm)].
    (syntax := SyntaxTable at: head ifAbsent: []) ~~ nil ifTrue: [^self recompile: (self performSyntax: syntax with: aForm)].
    "apply"
    aForm reverseDo: [:arg | arg compile: self].
    vpu iCalli: aForm size - 1.
    ^nil
]

If you call "compile:" on a SmallInteger, it eventually pushes itself onto the VPU's arithmetic stack by "ldInt:" instruction. Compiler>>form: deals with a full S-expression. The first part of it has something to do with the syntax. The buttom part of it, simply says that the arguments (assuming all of them are integers) are pushed onto the arithmetic stack, and a call instruction (iCallI:)is executed. It is simple. A sequence of VPU instructions is translated into native code and executed. It is pretty efficient.

Another experiment with the Compiler is something like this. You define a function by using compile:

.[[Compiler compile: '(lambda (x) (+ x 3))] print]
VPULabel => 9584804

A VPULabel is an object that represents a label (duh). You can send "value" to obtain the label, and simply call it by:

([[Compiler compile: '(lambda (x) (+ x 3))] value] 5)
 => 8

This means that the function call in Coke is relly close to what a native function call does.

You could imagine to take advantage of this compiler to write a more efficient version of "max". If you look at the implementation of "and" ("xLogAnd:"), it is really simple. (It definitely reminds me of an undergrad assignment for which I needed to write a PL/O compiler and bytecode interpreter.) What it does is very similar to "max" after all. (And Alex has a great idea about doing the complex Ruby argument handling at that level, by adding a VPU instruction. It will make it run really fast.)

I wrote 4 long articles and 1 short one. While the way you have to write code, test, and debug is painful for an average hacker like me, I also feel great potential of the system. Writing these articles helped me a lot, and I hope some readers think they are of some use.