The How of Macros 3: Syntax trees for literal data
The previous post showed how code is represented as syntax trees. This one shows how literal data like %{a: 3} are represented.
The series: Elixir compilation, syntax trees, literal data, quote
, unquote
, escape
, hygiene and var
, what's up with require
?
Simple data
The syntax tree for simple data like strings and numbers are the strings and numbers themselves:
iex(14)> quote do: "string"
"string"
iex(15)> quote do: 5.0
5.0
The interesting cases are Enumerable
literals like tuples, lists, and maps.
Tuples
If a syntax tree is composed of three-tuples, what happens if it contains a literal 3-tuple? What happens, for example, with this:
{1, 2, 3+4}
The syntax tree for a collection is represented by a type-specific special form whose last element is a list of subtrees that produce values:
{:{}, [],
[
1,
2,
{:+, [context: Elixir, import: Kernel], [3, 4]}
]
}
When you want to understand how quote
works, it'll be useful to understand this handling of literal collections.
Similarly, this:
inspect {1, 2, 3}
... produces this syntax tree:
{ :inspect,
[context: Elixir, import: Kernel],
[ {:{}, [], [1, 2, 3]} ]
}
The important thing to know is that 3-tuples that are a syntax tree can be distinguished from the representation of data within a syntax tree. And the reason is slightly odd. Literal collections can be told apart from code because they are turned into code – that is, into an instruction about how to create them.
Read the :{}
3-tuple above as instructing the bytecode generator to create bytecodes for the following:
apply Kernel, :{}, [1, 2, 3]
The same is true of tuples with different lengths:
iex(5)> quote do: {}
{:{}, [], []}
iex(6)> quote do: {1, 2, 3, 4}
{:{}, [], [1, 2, 3, 4]}
It is not true for 2-tuples, though:
iex(7)> quote do: {1, 2}
{1, 2}
Read on to see why.
Lists
Lists are represented literally, as themselves:
iex(8)> quote do: [1, 2, 3]
[1, 2, 3]
Note, though, that trees are still created for list elements:
iex(9)> quote do: [1 + 2, 3, {4, 5, 6}]
[
{:+, [context: Elixir, import: Kernel], [1, 2]},
3,
{:{}, [], [4, 5, 6]}
]
2-tuples are not represented in constructor format because they are used in keyword lists:
# Here's a keyword list, written oddly:
iex(10)> [{:a, 1}, {:b, 2}]
[a: 1, b: 2] # Elixir's `inspect` prints the normal format.
# What is the above list's syntax tree?
iex(11)> quote do: [a: 1, b: 2]
[a: 1, b: 2] # The same as the original list.
Why is that useful? Let's take an example from Lucas San Román. Like many macros, Ecto.Schema's schema
macro is used with do
:
schema "table" do
field :name, :string
field :age, :integer
end
The Elixir parser treats do ... end
as an alternative to do: ...
. Because of that, the schema
macro receives two arguments:
- the syntax tree for the literal string
"table"
(which is also"table"
), and - the syntax tree for the
do ... end
, which is a keyword list:[do: ...]
.
Remember from the first post that defmacro
is just a function that takes syntax trees as its arguments. That means normal Elixir pattern matching applies. Consider this:
defmacro schema(table, do: fields) do ...
If 2-tuples were handled consistently, the same way as other tuples, a keyword list would be parsed into [ { :{}, [], [:do, ...] } ]
. Because of that, our definition of schema
would not match the example of use I gave earlier. Instead, the schema
definition would have to look something like this:
defmacro schema(table, [{:{}, _, [:do, fields]}]) do ...
It would not be worth living in a world where the definition of a macro looked so different from its use. Hence, the representation of 2-tuples is literal. The format of Elixir's abstract syntax tree is tuned to be convenient for macro writers.
P.S. Sr. San Román has a three part series on Elixir's AST that I wish I'd found before starting this series.
Maps
Maps also use a constructor notation:
iex(13)> quote do: %{a: 1}
{:%{}, [], [a: 1]}
I don't know why.
With that as background, we're now ready to look at what quote
does.
Previous: Syntax trees
Next: quote