A Parsing Toolbox in Go
Notes on GoRGO Development
Why GoRGO?
What is GoRGO and how did it come to be, in a nutshell

Every software developer I know hates repetitive tasks. What’s the point of being a programmer if you can’t delegate your chores to a machine? And as a backend developer you are most likely open to the idea to employ some kind of domain specific language (DSL). That’s where a compiler generator may come into play. There is no shortage of them, ranging from good old bison to ANTLR and various Go variants like gocc. Why develop another one? GoRGO is a little different.

What is it about?

I have used many of the aforementioned tools extensively in my life, plus a handful of situations where I’ve written recursive descent parsers by hand. All these tools have their value and I appreciate the availability of compiler-compilers with a lot of horse power. But on the other hand, sometimes I want something more compact, a smart and lightweight tool to generate an interpreter for a custom language. And I want it in native Go, as this is what I currently use for Open Source. That’s what GoRGO strives to be.

The Name

The “RGO” in GoRGO stands for Right-derivated Grammar Operations, which may sound all gibberish to you (and I wouldn’t blame you). But there’s a hint in this name: we will focus on LR-parsing, i.e., we will construct bottom-up parsers which produce right-derivations. That’s exactly what yacc et al do (unlike ANTLR, which produces LL-parsers), so no surprise there—or is it? In fact it should be, as DSL parsers these days are implemented with PEGs most of the time. But PEGs (parser expression grammars) are top-down, so why clinge to bottom-up parsing?

Gorgosaurus

Gorgo⋅saurus (fierce lizard ) was not quite as big or famous as Tyrannosaurus Rex, but dangerous nevertheless. It lived some 75 million years ago in Northern America, eating anything it came across. We will meet its cousin T-Rex again when we'll talk about TeREx, a term rewriter.

PEG parsers have a number of disadvantages for the careless user. Although I love their simplicity, they lack a certain foolproofness which I expect for ad-hoc DSLs. Most ad-hoc grammars people write contain ambiguous rules, and PEGs hide ambiguity rather than resolving it. On the other hand, we do not want programmers to have to fix shift|reduce-conflicts and the like. The tool should just eat my grammar! Therefore we revert to LR-parsers, acknowledging the fact that we have to convince the parser to not be so tough on its users.

GoRGO follows a few design paradigm which in some aspects differ from your garden-variety parser generator.

No Code Generation

Parser generators (compiler compilers) usually generate source code. That’s often a good idea, as many of these tools can handle a multitude of host languages. Moreover, Go has a mechanism for preprocessing and code-generation (go-generate) which is well suited for parser generators. However, for small DSLs I prefer to skip a code-generation step and rather create the parser in place and immediately run it.

Digesting the grammar always comes first and results in a bunch of LR parse tables. Instead of saving these tables as source code into a file, we keep them in memory and provide them to a parser. Table generation is quick for small languages and we can afford to go through this step once per program execution.

Ambiguous Grammars

Conventional parser generator do not handle ambiguous grammars well. Especially for ad-hoc DSLs, however, grammar ambiguity is quite common. Parser which can handle these kinds of grammars have been known for some time, but rarely gained popularity (bison’s GLR-feature is an exception to this rule and ANTLR mitigates many of these problems, too). GoRGO implementes two flavors of bottom-up parsers which are able to cope with ambiguty: Earley parsing and GLR-parsing.

During tests for various DSL tasks I came to love Earley-parsing. Sketching out a grammar, attaching it into an Earley parser and feeding it some sample input, all with a couple lines of code, is a satisfying experience. In theory Earley parsing can severly degrade in performance, but I have yet to come across such a case in practice. I will talk more on Earley-parsing in a different post.

GoRGO is an ongoing experiment. Stay tuned for some real-world application stories.


Last modified on 2020-12-03