Write your own interpreter
I wrote my own little programming language!
It was a lot of fun, and although I was a bit intimidated by the idea at first, it turns out the project was just simple enough to be realistically possible to implement, while also being complex enough to deliver a proper hit of dopamine when things start to work.
In this article I will lay out a high-level overview over the steps I took to build my little interpreter. My hope is to make it easier for people who are slightly intimidated like I was to get started building without too many spoilers.
I won’t go into the details of an exact implementation, but rather outline the basic structure of the code.
I assume you have a basic knowledge of how interpreters work (i.e. what tokenizer, parser and interpreter are).
A few notes to start
1. A simple language
The language I built uses a Lisp-like syntax and only consists of a few built-in commands and data types.
2. Dynamic typing
Most programming languages should work for this project, but I found myself making heavy use of Ruby’s dynamic typing in my implementation. It made it much easier to pass around data of different types between functions in my implementation.
Statically typed languages usually have means to achieve effectively the same thing, but I imagine they would have gotten in the way of actually interpreting different data types.
3. Small steps
I found that writing a small sample of what I wanted a new feature to look like, and then working to parse that sample was both fun and efficient.
You can take a look at my samples in the Git repository
4. No error handling
I mostly ignored errors that could show up when parsing my language. The implementation assumes that the code passed in is valid and correct.
Let’s go!
0. Take input
First I take a file from the user as input, obviously, and read its contents.
1. Tokenizer
These contents are passed to a tokenizer function. This function iterates over the characters in the text and adds them to a variable that holds the current token. Special characters like parentheses, but also whitespace cause the current token to be pushed to a list that holds all tokens. In the case of special characters, a Ruby symbol holding the character is added to the list (A simple string would do the trick, but symbols have IDE autocompletion and cause less runtime overhead).
Keywords and built-in functions in my language’s syntax are marked as such in this step as well, also using Ruby symbols.
2. Parser
The parser iterates over the list of tokens.
- If it comes across a simple value, it returns that value with some information attaches to it (To be exact, it returns a class
Value
that has thedata
andtype
properties). - If it comes across an opening parenthesis, it treats the next element in the list as an operator and returns an
Operation
class with the correctaction
property. All elements up to the corresponding closing parenthesis are gathered, parsed recursively, and added to thedata
property of the returnedOperation
class.
In either case, the returned value is added to a list. The remaining tokens are parsed again, recursively, the resulting list is appended to the first list. The resulting list is then returned.
3. Interpreter
Like the parser, the interpreter is a recursive function. It takes a node (either a Value
or Operation
class), returns the data
inside Value
or executes the Operation
by mapping its action
to the corresponding native Ruby functions, applying it to all items in its data
property (The items need to be interpreted recursively before they can be used, of course) and returning the result.
The only step left now is to run this function on each of the items the parser returned in step 2.
And violà, the interpreter can now execute this tiny little language!
I hope this post inspired you to build your own interpreter, and maybe even helped you in doing so. It’s really simpler than it might seem at first, and the real challenge is to come up with a good approach and get started at all.