The How of Macros 4: `quote`

When the Elixir parser sees quote do: content, it produces a 3-tuple of the form {:quote, metadata, [[do: content]]}.

Marick takes a deep breath.

:quote is a special form that takes the abstract syntax tree content (which represents Elixir code) and turns it into an abstract syntax tree for literal data that, when compiled, creates the abstract syntax tree that represents the Elixir code.

Perhaps I should explain more slowly.

The series: Elixir compilation, syntax trees, literal data, quote, unquote, escape, hygiene and var, what's up with require?


Let's start by considering bytecode generation in a bit more detail, starting with the syntax tree for this expression:

elem {1, 100+a, 3}, which

... which is:

{:elem, [context: Elixir, import: Kernel],
 [
   {:{}, [], 
    [
      1, 
      {:+, [context: Elixir, import: Kernel], 
       [
         100,
         {:a, [], Elixir}
       ]
      }, 
      3
    ]
   }, 
   {:which, [], Elixir}
 ]
}

Bytecode generation linearizes this tree into a list of step-by-step instructions, starting from the bottom up (or leaves inward). That is, the generated bytecodes will first add the value of a to 100. Then they will use the result of that, together with 1 and 3, to form a 3-tuple. They will finally call elem with the 3-tuple and the value of which as arguments.

And here's an example of the bytecodes in action:

iex(20)> a = 899; which = 1

iex(21)> elem {1, 100+a, 3}, which
999

Now, let's add a twist and apply elem to some quoted Elixir text:

iex(17)> elem (quote do: inspect a), 0
:inspect

iex(18)> elem (quote do: inspect a), 1
[context: Elixir, import: Kernel]

iex(19)> elem (quote do: inspect a), 2
[{:a, [], Elixir}]

elem is working on the syntax tree, not on the result of compiling and executing that syntax tree. But remember that the syntax tree is a literal structure of nested 3-tuples. As you saw in the previous post, literal structures of that sort aren't stored anywhere for use during bytecode execution; rather, bytecode execution creates them. So what has to happen is:

  1. The quote special form takes a syntax tree.
  2. It must transform that syntax tree into a new syntax tree.
  3. That syntax tree is compiled into bytecodes into the normal way. Those bytecodes, when executed, build the original syntax tree.
  4. After that execution, more bytecodes are executed to apply elem to the syntax tree.

Let's go through that for our example.

The starting syntax tree

The full parse tree for the elem example is this:

{:elem, [context: Elixir, import: Kernel],
 [
   {:quote, [context: Elixir],
    [
      [do: 
        {:inspect, [context: Elixir, import: Kernel], 
         [{:a, [], Elixir}]
        }
      ]
    ]
   },
   0
  ]
 }

Before the bytecode generator generates bytecodes (from the bottom up), it first context-expands the tree, which happens from the top down. A lot of what happens in this phase is error checking that can't be done during parsing because it depends on understanding context. For example, this is invalid:

  def foo(a), do: ^a

... because pinned variables must be within a pattern. However, the parser doesn't keep track of where ^a is found; it just translates it into {:^, [], [{:a, [], Elixir}]}. So the error is checked for during context-expansion.

Such checking is best finished before bytecode generation starts so that the actual generator can be sure it's working with a fully valid syntax tree.

(This separation of concerns is typical in compilers. Context-free parsing is easily automatable, but context sensitivity most usually requires custom code that's best written as a postprocessing step.)

The new syntax tree

In addition to checking, the top-down processing also handles a :quote special form specially. Roughly speaking, it's handled by a function like this:

  def expand({:quote, _metadata, [[do: body]]}) do
    do_quote(body)
  end

do_quote then handles the various different "shapes" of syntax trees. In our case, the top of the tree is a 3-tuple representing a call to inspect, which would be handled by this function:

  def do_quote({name, metadata, args}) when is_list(args) do
    expanded = for arg <- args, do: do_quote(arg)
    { :{}, metadata, expanded }
  end

Because of the for, do_quote is applied to a variable; that is, to this 3-tuple: {:a, [], Elixir}. That processing can be described by this function:

  def do_quote({name, metadata, where}) when is_atom(where) do
    { :{}, metadata, where }
  end

(The code is actually written in Erlang, not Elixir.)

When the :quote tree is processed the result is this:

{:{}, [], 
 [
   :inspect, 
   [context: Elixir, import: Kernel], 
   [
     {:{}, [], [:a, [], Elixir]}
   ]
 ]
}

Since expand works top down, that expanded syntax tree is then embedded within the full parse tree, giving us this:

{:elem, [context: Elixir, import: Kernel], 
  [ 
    # start result of `quote`
    {:{}, [], 
     [
       :inspect, 
       [context: Elixir, import: Kernel], 
       [
         {:{}, [], [:a, [], Elixir]}         # line 9
       ]
     ]
    },
    # end result of `quote`
    0
  ]
}

Bytecode generation

Since code generation is (roughly) bottom up, the first bytecodes are generated from the :{} expression on line 9, and they are bytecodes that produce the tuple {:a, [], Elixir}.

The next set of bytecodes generated will combine that result with :inspect and the metadata keyword list (which contains only atoms) to produce this:

{:inspect, [context: Elixir, import: Kernel], 
 [{:a, [], Elixir}]
}

Ta-da! A roundabout journey, but we've managed to create the literal syntax tree for the original inspect a.

Abstractly, we can think of the bytecode generator as now creating  the bytecodes to apply elem to that recreated literal syntax tree and 0. In actuality, fetching tuple elements is so common that there's a special bytecode for it. So it happens without the expense of a function call.


That's all well and good, but it's rare to see a quote expression without an unquote within it. How does that work? Read on.

Previous: Syntax trees for literal data
Next: unquote