Parsing MetaPost

These days I spend some evenings programming a variant of MetaPost, a DSL for diagrams and illustrations. MetaPost inherits much of its syntax from MetaFont, which has been designed by D. E. Knuth. Mr Knuth is the “father of LR-parsing,” so one would expect some interesting language features in MetaPost—and we are not disappointed!

What MetaPost does

MetaPost’s unique selling point is its ability to solve linear equations. In some situations, you’d rather not specifiy a point position, but let the computer figure out its location. Let’s take a look at an example:

Star drawn by MetaPost

Don’t worry if this looks a bit like your last mathematics exam. The unique part are the lines like A'=whatever[A,C];. This is MetaPost’s way of requesting a point A' to lie somewhere on the line between points A and C. Given that A' should lie on [A,C ] as well as on [B,D ], MetaPost will find the intersection point by solving a set of linear equations. That’s a pretty cool feature! However, that’s not what I want to talk about in this post.

Variables: It’s complicated…

The MetaPost language is designed to be somewhat reminiscent of mathematical writing. This is especially noticeable with its conventions for variable names. You already caught a glimpse of that with those primed point variables (A'), but that’s not the end of it. MetaPost will happily digest identifiers like $z'_r$ (denoted as z'.r) and $x_1$ (x1). Add to this the ability to handle fractions as numeric input, you are free to instruct MetaPost to calculate

$$-{1\over3}x_1$$

by typing -1/3x1. That might not appeal to everyone, but it sure is easy to type and MetaPost will usually understand what you had in mind.

Those variable names look innocent enough, but from a parser’s point they carry slightly different semantics than with other programming languages. Let’s consider the input 7 - -a1r. By now you propably could guess that this means “$7- -a_{1r}$.” Setting aside that you should enclose the second term in parenthesis anyway, put yourself in the shoes of the parser: it will see a sequence of tokens like 7 - - a 1 r. And just when you successfully taught your parser to accept it, someone types in y = 7 - - 1 / 3 x [ n - 1 ], intending to express the equation

$$y = 7 - -{1\over3}x_{n-1}$$

Let’s try with Python:

>>> n = 2    # pre-set subscript variable n
>>> y = 7 - - 1 / 3 x [ n - 1 ]
  File "<stdin>", line 1
	y = 7 - - 1 / 3 x [ n - 1 ]
					^
SyntaxError: invalid syntax

We expected that, didn’t we? The error doesn’t result from the fact that x is an unknown quantity; it’s just that Python cannot handle the syntax: it misses a ‘*’ before the x. Now let’s have MetaPost a go:

This is MetaPost, version 1.999
**\relax
*n:=2;  % pre-set subscript variable n
*y = 7 - - 1 / 3 x [ n - 1 ];
*show y;
>> 0.33333x1+7

The last line is MetaPost’s way of telling us that variable y is dependent on $x_1$, which is of unknown value yet. It is worth noting that [n-1] is not your usual array indexing, but rather the “generation” of a mathematical subscript. Just for fun, create an irrational subscript:

*y=x[sqrt 2];
*showvariable x
  x[]=numeric
  x1.41422=y

Earley’s Attempt

Of course part of the complications arise from the usual unary plus/minus problem. But the notation of scalar prefixes for variables makes it a bit more difficult to implement. John Hobby, the author of MetaPost, has been kind enough to publish a BNF grammar for MetaPost (pp. 92–97). Remarkably, it contains the rule

⟨scalar multiplication op ⟩ → + | −
| ⟨ ‘⟨number or fraction ⟩’ not followed by ‘⟨add op ⟩ ⟨number ⟩’ ⟩

a strong hint that a dose of lookahead has been required to implement scalar prefixes (a.k.a. scalar multiplication op ).

GoRGO’s Earley-parser has (yet) no feature for operator precedence, so I was unsure of how difficult it would be to implement scalar prefixes. Turned out, not difficult at all—it worked first go! (Well, technically that’s a lie, because I ran into a subtle bug of the Earley parser code, but that’s a different story.) One especially pretty feature of Earley parsers is that you can watch them think. Doing so, I found the parser’s default behaviour for certain kinds of ambiguity spot on for the problem. The relevant rules of the MetaPost grammar had to be slightly adapted to get rid of the scanner hack and actually resulted in an easier to read version.

⟨tertiary ⟩ → ⟨secondary ⟩
| ⟨tertiary ⟩ PlusOrMinus ⟨secondary ⟩

⟨primary ⟩ → ⟨atom ⟩
| PlusOrMinus ⟨primary ⟩

⟨atom ⟩ → ⟨variable ⟩
| Signed ⟨variable ⟩

(‘Signed’ is a number or fraction with an optional unary sign).

Up to now it works just fine. Let’s see what other curious language features await.

Last modified on 2020-12-10