How to make your own Programming Language

Devansh Gupta
4 min readNov 18, 2023

I still remember writting my first “for loop” program in C Language back in 9th Standard. Our teacher taught us to print counting through a C program.

for(int i=0; i< 10000; i++){
printf("\n%d",i+1);
}

I was fascinated by the fact that how this gibberish can print counting upto 10000 in a matter of seconds.

I have played with many languages ever since then, have built different kind of programs, applications but creating a Language has been a childhood dream.

Few days back I started a project called RUGIA — A simple expression language.

A stepping stone towards creating a Programming Language.

I’m building it in Golang as It has the perfect blend of features require to build an Interpreter.

So How does it work? It can be deconstructed in a 3 step process

Step 1 — Lexical Analysis/ Tokenization (less terrifying)

In Lexical Analysis, we pass every symbol in the input one by one and try to make valid tokens out of it (means any allowed symbol of our language).
e.g.

Input: 3 < (5 + 1)

here are the tokens
3 Number
< LessThan
( BraceOpen
5 Number
+ Plus
1 Number
) BraceClose

Observe that we are not considering white spaces. The Algorithm we use to do this is called One Token Look Ahead ( time to open some Compiler Design Books guys )

Step 2 — Parsing ( now the real fun begins )

After we have our list of tokens, we need to get meaning out this symbols.
e.g. Consider the Token List for expression

Expr: 10 ( 20 )
Tokens: [ Number, OpenBrace, Number, CloseBrace ]

You are right this token list doesn’t make any sense. It does nothing.

Consider this now

Expr: 10 + ( 20 )
Tokens: [ Number, Plus, OpenBrace, Number, CloseBrace ]

This expression make sense, now what about this

Expr: -420
Tokens : [ Minus, Number ]

Does this make sense? Hell! Yes
It represents a negative number.

So to make some sense about these tokens, we need to structure these tokens somehow.

Legends found that Binary Trees ( A tree having 2 children at most, I hope you took the data structures class in Undergrad) can help us with this. The philosophy is simple, We need one output after some operation on two expressions.
e.g.

3 + 7 will evaluate to 10
Expr1 operator Expr2 will evaluate to Expr3

So we represent all this mess via a Binary Tree like below

Syntax Tree for 3 + 7
var exp1 Node = newNumber(3)
var exp2 Node = newNumber(7)
var node Node = { leftChild: exp1, rightChild: exp2, token: Plus }

Above example is quite simple but We can broken down complex ones to binary trees easily with the parsing algorithm. RUGIA is gonna have Syntax Tree visualizer in Future.

But what about expression like -2, we can have tree like this

Syntax Tree for -2

A more complex example would be of expression like 2 + 3 * 4 + 1

Syntax Tree for 2 + 3 * 4 + 1

There are many naunces of parsing that We will discuss some other time, like we are using Context Free Grammar, that helps us to resolve precedence conflicts among different operators (consider watching some Compiler Design videos to understand the concepts), We are using Parsing Combinators to parse different kind of expressions.

Now that we know how to build a parser, we are ready to parse expressions into Syntax Trees.

Once we have the Syntax Tree, we can evaluate it’s output.

Step 3 — Evaluation ( Everyone’s favourite )

While creating nodes for syntax tree we know that each node represents some token, so while defining the nodes, we can pass an evaluation function which would perform the real computation.

e.g.

A Plus token node expects it’s children to be numbers and outputs the sum of those two numbers

func evalPlus(x float64, y float64) float64 {
return x + y
}

A Number token node expects a string and converts string to actual number

func evalNumber(num string) float64 {
n,_ := strconv.ParseFloat(num, 64)
return n
}

We would pass these evaluators to the Node Constructor and then use Post Order Traversal to get the output.

Output :

func eval(node *Node) float64 {
x:= eval(node.leftChild)
y:= eval(node.rightChild)
if node.token == Plus {
return evalPlus(x,y)
} else if node.token == Number {
return evalNumber(node.token.val)
}
return 0
}

Voila! We made a small expression language, We are among the smartest living beings on earth now.

You should consider going through RUGIA’s source code and see how we have implemented evaluation logic using Higher order functions ( means Golang treats Functions also as values and we can pass them around, like other values)

We still need to add other features in RUGIA like Variables, Functions and many more.

Please feel free to contribute

RUGIA : https://github.com/devansh42/rugia

Parsing Combinators: https://www.youtube.com/watch?v=dDtZLm7HIJs

Book I’m refering to build the interpreter: https://craftinginterpreters.com/a-tree-walk-interpreter.html

--

--