The How of Macros 3: Syntax trees for literal data

The previous post showed how code is represented as syntax trees. This one shows how literal data like %{a: 3} are represented.

The series: Elixir compilation, syntax trees, literal data, quote, unquote, escape, hygiene and var, what's up with require?


Simple data

The syntax tree for simple data like strings and numbers are the strings and numbers themselves:

iex(14)> quote do: "string"
"string"

iex(15)> quote do: 5.0
5.0

The interesting cases are Enumerable literals like tuples, lists, and maps.

Tuples

If a syntax tree is composed of three-tuples, what happens if it contains a literal 3-tuple? What happens, for example, with this:

{1, 2, 3+4}

The syntax tree for a collection is represented by a type-specific special form whose last element is a list of subtrees that produce values:

{:{}, [], 
 [
   1, 
   2, 
   {:+, [context: Elixir, import: Kernel], [3, 4]}
 ]
}

When you want to understand how quote works, it'll be useful to understand this handling of literal collections.

Similarly, this:

inspect {1, 2, 3}

... produces this syntax tree:

{ :inspect, 
  [context: Elixir, import: Kernel], 
  [ {:{}, [], [1, 2, 3]} ]
}

The important thing to know is that 3-tuples that are a syntax tree can be distinguished from the representation of data within a syntax tree. And the reason is slightly odd. Literal collections can be told apart from code because they are turned into code – that is, into an instruction about how to create them.

Read the :{} 3-tuple above as instructing the bytecode generator to create bytecodes for the following:

apply Kernel, :{}, [1, 2, 3]

The same is true of tuples with different lengths:

iex(5)> quote do: {}               
{:{}, [], []}
iex(6)> quote do: {1, 2, 3, 4}
{:{}, [], [1, 2, 3, 4]}

It is not true for 2-tuples, though:

iex(7)> quote do: {1, 2}
{1, 2}

Read on to see why.

Lists

Lists are represented literally, as themselves:

iex(8)> quote do: [1, 2, 3]
[1, 2, 3]

Note, though, that trees are still created for list elements:

iex(9)> quote do: [1 + 2, 3, {4, 5, 6}]
[
 {:+, [context: Elixir, import: Kernel], [1, 2]}, 
 3, 
 {:{}, [], [4, 5, 6]}
]

2-tuples are not represented in constructor format because they are used in keyword lists:

# Here's a keyword list, written oddly:
iex(10)> [{:a, 1}, {:b, 2}]
[a: 1, b: 2] # Elixir's `inspect` prints the normal format.

# What is the above list's syntax tree? 
iex(11)> quote do: [a: 1, b: 2]      
[a: 1, b: 2] # The same as the original list.

Why is that useful? Let's take an example from Lucas San Román. Like many macros, Ecto.Schema's schema macro is used with do:

schema "table" do
  field :name, :string
  field :age, :integer
end

The Elixir parser treats do ... end as an alternative to do: .... Because of that, the schema macro receives two arguments:

  1. the syntax tree for the literal string "table" (which is also "table"), and
  2. the syntax tree for the do ... end, which is a keyword list: [do: ...].

Remember from the first post that defmacro is just a function that takes syntax trees as its arguments. That means normal Elixir pattern matching applies. Consider this:

defmacro schema(table, do: fields) do ...

If 2-tuples were handled consistently, the same way as other tuples, a keyword list would be parsed into [ { :{}, [], [:do, ...] } ]. Because of that, our definition of schema would not match the example of use I gave earlier. Instead, the schema definition would have to look something like this:

defmacro schema(table, [{:{}, _, [:do, fields]}]) do ...

It would not be worth living in a world where the definition of a macro looked so different from its use. Hence, the representation of 2-tuples is literal. The format of Elixir's abstract syntax tree is tuned to be convenient for macro writers.

P.S. Sr. San Román has a three part series on Elixir's AST that I wish I'd found before starting this series.

Maps

Maps also use a constructor notation:

iex(13)> quote do: %{a: 1}
{:%{}, [], [a: 1]}

I don't know why.


With that as background, we're now ready to look at what quote does.

Previous: Syntax trees
Next: quote