The How of Macros 2: Syntax trees

The last post looked mainly at the different binding maps that are created and used during bytecode generation. This post looks at the syntax trees that are parsed from Elixir text, then delivered to the bytecode generator.

The series: Elixir compilation, syntax trees, literal data, quote, unquote, escape, hygiene and var, what's up with require?

Consider this function call:

inspect 5

The abstract syntax tree for that expression is this:

 [context: MacroExamples, import: Kernel], 

To a first approximation, all syntax trees are made from nested 3-tuples with these elements:

  1. An atom like :inspect. Often, as in this case, the atom is a function name. You'll see other things it can be later.
  2. Metadata, which is information that helps the compiler (or a human) interpret the meaning of the other elements. In the example above, the metadata says that syntax tree to compile comes from the MacroExamples module and that the compiler should look in the Kernel module if inspect isn't defined inside MacroExamples.

    I don't plan to discuss metadata in any great detail during this series.
  3. Nested or related data. In this case, the nested data is [5], which indicates that :inspect is an Elixir function (or macro) that is to be applied to a single argument 5.

Note the similarity of the above form to that of the apply function: apply Kernel, :inspect, [5].

Nested trees

Although I said that syntax trees are nested structures, I didn't show an example. Here's code with a nested function call:

inspect(1 + 4)

Here's the syntax tree:

{:inspect, [context: MacroExamples, import: Kernel],
 [{:+, [context: MacroExamples, import: Kernel], 
   [1, 4]}]}

The nested 3-tuple is a mildly surprising reminder that, in Elixir, + is a function just like any other. For example:

iex(8)> apply Kernel, :+, [3, 4]

A variable's syntax tree

A variable's name - like all names - is also represented as a 3-tuple. The most notable difference between variable 3-tuples and 3-tuples for function calls (like :inspect above) is that the last element of a variable's 3-tuple is an atom, not a list:

{:a, [], MacroExamples}

For the moment, pretend it doesn't matter which atom is in the third position; it could be nil just as well as MacroExamples.

Special forms

An expression like a = 5 turns into this 3-tuple:

{:=, [], [{:a, [], Elixir}, 5]}

I'm going to call such 3-tuples special forms because a lot of them (like :=)  correspond to Elixir's special forms. What that means is that each of them has some special code in the bytecode generator, code picked out by pattern matching on the 3-tuple:

def do_something({:=, metadata, [var, value]}) do
  # Generate bytecodes that, when executed, will add 
  # the var->value binding to the currently active 
  # binding map.

Function calls with explicit modules

Given this:

IO.inspect 5

The syntax tree is this:

  {:., [], 
       [{:__aliases__, [alias: false], [:IO]}, 
  [],    # metadata 

This is unusual in that the top level is a 3-tuple whose first element is a special form (rather than an atom). Let's look at that first element, which describes how to find a function:

    {:__aliases__, [alias: false], [:IO]}, 

Let's not worry about the :. which only serves to select the code that handles the final element of the 3-tuple. Let's look only at that code:

  {:__aliases__, [alias: false], [:IO]}, 

Recall that the bytecode generator in effect converts a function call into an apply like this:

apply IO, :inspect, [5]

You can see that the IO comes from the :__aliases__ 3-tuple. Perhaps confusingly, the metadata says that 3-tuple doesn't contain an alias (:alias is false).  Instead, IO is the complete name of the module.

Contrast the syntax tree for this situation:

  alias String.Chars

  Chars.to_string 5

The whole tree for the second expression looks the same as the ones above except for that deeply-embedded list:

  {:__aliases__, [alias: String.Chars], [:Chars]},

When constructing an apply, the bytecode generator has the information to use String.Chars as the first argument.

That's most of what you need to know about syntax trees, except: if syntax trees are represented as 3-tuples, how do you represent a literal 3-tuple like {1, 2, 3}? That deserves its own post.

Previous: Elixir compilation
Next: Syntax trees for literal data