(I set myself the goal of posting every week, but the latest Skydeck release has kept me busy, and it turned out I didn't understand the following as well as I thought.)
After seeing the examples of Camlp4 quotations in my last post, you may wonder:
- what are all the quotations (
- what are all the antiquotations (
- which antiquotations are allowed where?
A small complication is that there is more than one concrete syntax for OCaml in Camlp4: the original (i.e. normal OCaml syntax) and revised syntaxes. The original syntax parser is given as an extension of the revised syntax one. So we'll begin in camlp4/Camlp4Parsers/Camlp4OCamlRevisedParser.ml (line 588 in the 3.10.2 source):
expr: [ "top" RIGHTA [ (* ... *) | "if"; e1 = SELF; "then"; e2 = SELF; "else"; e3 = SELF -> <:expr< if $e1$ then $e2$ else $e3$ >>You can read the parser more or less as a BNF grammar. This code defines a nonterminal
exprby giving a bunch of cases. The cases are grouped together into levels, which can be labeled and given an associativity (that's what
NONASSOCare). Levels are used to indicate the precedence of operators, and also to provide hooks into the parser for extending it; for our purpose here you can skip over them.
You can read a case like a pattern match: match the stuff to the left of the arrow, return the stuff to the right. (What's being matched is a stream of tokens from the lexer.) A parser pattern can contain literal strings like
"if", backquoted data constructors like
`INT (which can carry additional data), nonterminals, and some special keywords like
SELF. You can bind variables using ordinary pattern-matching syntax within token literals, and use
x = y syntax to bind the result of a call to a nonterminal.
The right side is a piece of AST representing what was parsed, and in most cases it is given as a quotation. This is pretty confusing, because often the left and right sides of a case look very similar, and you can't tell what AST node is produced. However, it gives us lots of examples of tricky quotations, and since we have already seen how to expand quotations we can deal with it. (If you're curious how Camlp4 is written using itself see camlp4/boot.)
Focusing on the
if case: the keywords
else are parsed with an expression after each (at least we know that's the syntax of normal OCaml, and that gives a clue to what
SELF means: parse the current nonterminal); the expressions are bound to a variables; then the pieces are put together into an
ExIfe AST node.
(Some other special keywords you'll see are
OPT, which makes the next item optional, and
LIST1, which parse a list of items separated by the token after
LIST1 means there must be at least one item.)
OCaml allows you to leave off the
else part; where is the code for that? Turns out this is not allowed in revised syntax, and the original syntax overrides this part of the parser. Take a look at camlp4/Camlp4Parsers/Camlp4OCamlParser.ml (line 292):
expr: LEVEL "top" [ [ (* ... *) | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top"; "else"; e3 = expr LEVEL "top" -> <:expr< if $e1$ then $e2$ else $e3$ >> | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top" -> <:expr< if $e1$ then $e2$ else () >>(Notice how the
exprdefinition is qualified with the level in the revised grammar where it should slot in.)
Quotations and antiquotations
Hopefully that is enough about parsing to muddle through; let's move on to quotations. Here's another piece of the revised parser (line 670)--these are still cases of
[ `QUOTATION x -> Quotation.expand _loc x Quotation.DynAst.expr_tagThe
`QUOTATIONtoken contains a record including the body of the quotation and the tag. The record is passed off to the
Quotationmodule to be expanded. The actual expansion happens in camlp4/Camlp4Parsers/Camlp4QuotationCommon.ml. Looking to the bottom of that file, there are several lines like:
add_quotation "sig_item" sig_item_quot ME.meta_sig_item MP.meta_sig_item;This installs a quotation expander for the
sig_itemtag. The expander parses the quotation starting at the
sig_item_quotnonterminal in the parser, then runs the result through the antiquotation expander (see below). (The last two arguments to
add_quotationhave to do with the context where a quotation appears: inside a pattern you get
PaFoonodes while inside an expression you get
ExBarnodes.) So we can answer one of the questions posed at the beginning: what are all the quotation tags? We can see here that there is a quotation for each type in camlp4/Camlp4/Camlp4Ast.partial.ml.
Now let's look at antiquotations, which are more complicated (line 671):
| `ANTIQUOT ("exp"|""|"anti" as n) s -> <:expr< $anti:mk_anti ~c:"expr" n s$ >>The
`ANTIQUOTtoken contains the tag and the body again (and the parser can choose a case based on the tag). The
antiantiquotation creates a special AST node to hold the body of the antiquotation; each type in the AST has a constructor (
TyAnt, etc.) for this purpose. The
mk_antifunction adds another tag, which is not always the same as the one we parsed; the
~cargument adds a suffix giving the context where the antiquotation appeared.
There are two places where antiquotations are interpreted. First, in camlp4/Camlp4Parsers/Camlp4QuotationCommon.ml (line 89):
[ "`int" -> <:expr< string_of_int $e$ >>This is one of a bunch of cases in a map over the syntax tree. It handles antiquotations like
<:expr< $`int:5$ >>, which turns into an
ExInt. You can also see cases here for the
antiantiquotations, and some things to do with
listantiquotations we haven't seen yet (more on this below).
Things that don't match these cases are handled when the AST is pretty-printed. Let's look at camlp4/Camlp4/Printers/OCaml.ml (line 510):
| <:expr< $int:s$ >> -> o#numeric f s ""This case handles antiquotations like
<:expr< $int:"5"$ >>. Again, this produces an
ExInt, but you give it a string instead of an int.
What we have learned
Teaching a person to fish is fine, unless that person starves while trying to finish their PhD in theoretical pescatology. But I hope that you can see how we might go about answering the remaining questions--what are all the antiquotations, and where are they allowed--by examining all the
`ANTIQUOT cases in the parser and puzzling out where they get expanded.
Let's look at a particular example, by way of addressing the comment Nicolas Pouillard (aka Ertai) made on the last post. He points out that the final
of_string can go outside the antiquotation. How could we learn this from the Camlp4 code? Let's find where the antiquotation is expanded, starting at the point where the
function keyword is parsed (Camlp4OCamlParser.ml line 299):
| "function"; a = match_case -> <:expr< fun [ $a$ ] >>(the right side is revised syntax) which uses
match_case: [ [ OPT "|"; l = LIST1 match_case0 SEP "|" -> Ast.mcOr_of_list l ] ]You might think that
match_case0parses a single case, but let's check (Camlp4OCamlRevisedParser.ml line 778):
match_case0: [ [ `ANTIQUOT ("match_case"|"list" as n) s -> <:match_case< $anti:mk_anti ~c:"match_case" n s$ >> | `ANTIQUOT (""|"anti" as n) s -> <:match_case< $anti:mk_anti ~c:"match_case" n s$ >>We're interested in the second case for the moment: here's the antiquotation with no tag used in
of_string. So the list of cases is returned by
match_caseas its tag) and more cases can be parsed following it.
(Now we can see a justification for a puzzling design decision in the AST: instead of collecting match cases in a list, it collects them with
McOr nodes. Many arrangements of
McOr nodes correspond to the same list of cases. As the above possibility shows, this is useful: an antiquotation can return zero, one, or several match cases, and we don't have to worry about splicing them into the list. On the other hand, it makes consuming the AST a little more complicated.)
We can go one step further: if we use the
list antiquotation, the first case in
match_case0 returns an antiquotation with tag
listmatch_case, and we get the following expansion (Camlp4QuotationCommon.ml line 117):
| "listmatch_case" -> <:expr< Ast.mcOr_of_list $e$ >>So our final
let of_string = function $list: List.map (fun c -> <:match_case< $`str:c$ -> $uid:c$ >>) cons$ | _ -> invalid_arg "bad string"Can we do something similar with the generation of the variant type? No, as it turns out. In the revised syntax, the arms of a variant are given inside square brackets, so we can say:
type t = [ $list:List.map (fun c -> <:ctyp< $uid:c$ >>) cons$ ]But in the original syntax, without at least one constructor to make clear that we're defining a variant, there's no context to interpret a list, and this is reflected in the parser, which doesn't allow a
listantiquotation there. This kind of problem is apparently why the revised syntax was introduced.
So far I've talked only about generating OCaml code; next time I'll cover how to use Camlp4 to consume OCaml, and build a simple code analysis tool.