Table of Contents
From Lisp to LLVM
Background
This page shows the implementation of a compiler which recognizes and translates part of the LISP programming language into the LLVM IR syntax (more information about LLVM can be found here).
The LLVM Intermediate Representation (IR) is placed between the source code in a given programming language and the machine code for a specific architecture. It is independent from both the programming language and the architecture.
A brief introduction
It is important to know that LISP is a family of programming languages born in the early 60s and developed to date. The most common LISP dialect is the Common LISP, which is also the one studied below.
Some useful resources:
- Wikipedia page: https://en.wikipedia.org/wiki/Lisp_(programming_language)
- CLISP resources site: https://clisp.sourceforge.io/
- LISP installation: https://lisp-lang.org/learn/getting-started/
Tutorials:
An interesting article:
- Lisp and the origins of AI: https://opensource.com/article/19/9/command-line-heroes-lisp
Implemented parts
List of LISP parts supported by the compiler:
- Data types:
- Basic data types:
- int
- double
- strings
- Aggregate data types:
- array (1-dimension) - only double and integers
- Reference data types:
- functions (only integer numerical parameters, no return type)
- Variables:
- global variables
- local scoping of variables inside functions
- Operators:
- arithmetic operators:
- sum (+)
- difference (-)
- multiplication (*)
- division (/)
- relational operators:
- equal(=)
- not equal (/=)
- greater than (>)
- less than (<)
- greater than equal to (>=)
- less than equal to (<=)
- bitwise operators:
- AND (LOGAND)
- OR (LOGIOR)
- XOR (LOGXOR)
- assignments:
- DEFVAR
- SETQ
- Control statements:
- decision making:
- if (true condition)
- if (true condition) (false condition)
- if (false condition)
- loops:
- loop statement
- loop for statement
- Useful instructions:
- SETF
- DEFUN
- AREF
- Output:
- PRINT
- WRITE
Compiler
The compiler is made up of two parts: a scanner and a parser.
Scanner
The Scanner, created using jflex, scans an input file and generates a specific token each time that a stream of characters matches with a given rule (that can be a regular expression). In that case, it recognizes all useful symbols of a LISP source code, generating tokens which are later used by the Parser. Symbols cover most important keywords of the LISP language.
Snippet of LISP Scanner
nl = \r|\n|\r\n ws = [ \t] string = (\"[^0-9]*\") id = ([0-9]*[a-zA-Z\?\.\%&!\^\$£\+_\/\<\>\*\\\[\]@#\-]+[a-zA-Z0-9\?\.\%&!\^\$£\+_\/\<\>\*\\\[\]\-@#]*) integer = ([\+\-]*[1-9][0-9]*|0) double = (([0-9]+\.[0-9]*) | ([0-9]*\.[0-9]+)) (e|E('+'|'-')?[0-9]+)? not_evaluate = ('[a-zA-Z\-]+) %% "(" {return symbol(sym.RO);} ")" {return symbol(sym.RC);} "{" {return symbol(sym.BO);} "}" {return symbol(sym.BC);} "=" {return symbol(sym.EQ);} "+" {return symbol(sym.PLUS);} "-" {return symbol(sym.MINUS);} "*" {return symbol(sym.STAR);} "/" {return symbol(sym.DIV);} "<" {return symbol(sym.MIN);} ">" {return symbol(sym.MAJ);} "<=" {return symbol(sym.MIN_EQ);} "=<" {return symbol(sym.EQ_MIN);} ">=" {return symbol(sym.MAJ_EQ);} "=>" {return symbol(sym.EQ_MAJ);} "'" {return symbol(sym.APICE);} "[" {return symbol(sym.SO);} "]" {return symbol(sym.SC);} "int" {return symbol(sym.INT_TYPE);} "double" {return symbol(sym.DOUBLE_TYPE);} //Array make-array {return symbol(sym.MAKEARRAY);} :element-type {return symbol(sym.ELEMENTTYPE);} aref {return symbol(sym.AREF);} //Logical operators logand {return symbol(sym.LOGAND);} logior {return symbol(sym.LOGIOR);} logxor {return symbol(sym.LOGXOR);} //Output print {return symbol(sym.PRINT);} write {return symbol(sym.WRITE);} //Statements if {return symbol(sym.IF);} loop {return symbol(sym.LOOP);} for {return symbol(sym.FOR);} from {return symbol(sym.FROM);} to {return symbol(sym.TO);} do {return symbol(sym.DO);} in {return symbol(sym.IN);} when {return symbol(sym.WHEN);} then {return symbol(sym.THEN);} return-from {return symbol(sym.RETURNFROM);} return {return symbol(sym.RETURN);} //Functions defvar {return symbol(sym.DEFVAR);} defun {return symbol(sym.DEFUN);} setf {return symbol(sym.SETF);} setq {return symbol(sym.SETQ);} list {return symbol(sym.LIST);} , {return symbol(sym.CM);} {integer} {return symbol(sym.INT, new Integer(yytext())); } {double} {return symbol(sym.DOUBLE, new Double(yytext()));} {not_evaluate} {return symbol(sym.NOTEVAL, new String(yytext()));} {id} {return symbol(sym.ID, new String(yytext()));} {string} {return symbol(sym.STRING, new String(yytext()));} ";".* {;} {ws}|{nl} {;}
Parser
The parser is created using CUP. It takes as input the tokens provided by the scanner and recognize specific sequences of symbols. Every time a sequence is reduced, rule is executed. With the support of Java code for rules, the corrisponding LLVM IR code is produced.
Structure:
non_terminal ::= SYMBOLS {: /* rule code */:};
Data structures
In the parser code is contained the Java code used to support the parser in memorizing variables and assigning the labels needed to the LLVM IR instructions.
//code buffer for llvm public static StringBuffer irCode = new StringBuffer(); //global code buffer for llvm //it is used to declare global variables like string //at the beginning of the program public static StringBuffer globalCode = new StringBuffer(); public BufferedWriter file_out; //flag used by addCode to understand in //what buffer code must be written public static boolean writeOnGlobal = false; //used for local scoping inside functions public static Integer tempGLobal = 0; //labelCounter for assigning registers public static Integer varLabelCounter = 1; //labelCounter for global variables public static Integer globalVarLabelCounter = 4; //labelCounter for decision statements public static Integer labelDecision = 0; public HashMap <String, varObject> vars; public HashMap<String, funcObject> functionsMap; public LinkedList<loopVar> loopList; /* addCode() add a string to the buffer code */ public void addCode(String var){ if(!writeOnGlobal){ irCode.append(var); } else{ globalCode.append(var); } } /* getVarLabel() Returns a unique label for a variable */ public String getVarLabel(){ return (varLabelCounter++).toString(); } /* getGlobalVarLabel() Returns a unique label for a global variable */ public String getGlobalVarLabel(){ return (globalVarLabelCounter++).toString(); } /* getDecisionLabel() Returns a unique label for a decision statement */ public String getDecisionLabel(){ return (labelDecision++).toString(); } public class varObject{ public String label; public String align; public String stringValue; public boolean global; /* void constructor is used to initialize values directly written in the code */ public varObject(){ this.label = null; this.stringValue = null; this.align = null; this.global = false; //All variables are declared at beginning as local } public varObject(String align, boolean setLabel){ this.align = align; this.global = false; if(setLabel){ if(align.equals("1")){ //If align == 1 is a string so it's global this.label = getGlobalVarLabel(); } else{ this.label = getVarLabel(); } } } } public class intVar extends varObject{ public Integer intValue; public intVar(){ this.intValue = null; } public intVar(Integer value, String align, boolean setLabel){ super(align, setLabel); this.intValue = value; this.stringValue = value.toString(); } } public class doubleVar extends varObject{ public Double doubleValue; public doubleVar(){ this.doubleValue = null; } public doubleVar(Double value, String align, boolean setLabel){ super(align, setLabel); this.doubleValue = value; this.stringValue = value.toString(); } } public class stringVar extends varObject{ public stringVar(String value, boolean setLabel){ super("1", setLabel); this.stringValue = value; } } public class loopVar extends varObject{ public loopVar(String name){ super(); this.stringValue = name; this.label = getDecisionLabel(); } } public class arrayVar extends varObject{ public Integer size; public String type; public arrayVar(Integer size, boolean setLabel){ super("", setLabel); this.size = size; } public void allocate(String type){ this.type = type; this.label = getVarLabel(); if(this.type.equals("double")){ this.align = "8"; addCode("%"+this.label+" = alloca ["+this.size+" x double], align 8\n"); } else if(this.type.equals("i32")){ this.align = "4"; addCode("%"+this.label+" = alloca ["+this.size+" x i32], align 4\n"); } } } public class funcObject{ public String name; public String returnType; public Integer size; public String[] arguments; public funcObject(String name, Integer size){ this.name = name; this.size = size; this.arguments = new String[size]; } }
varObject
: is a Java class used to model a generic object of the language Every object is used to memorize useful informations (like the Label) to write the correct LLVM IR code. On the following, the specializing classes:intVar
: represents the Integer variablesdoubleVar
: represents the Double variablesstringVar
: represents the String variablesloopVar
: represent a loop or loop for statement of the LISP programming languages. In this objects are stored labels used to perform correctly jumps in the LLVM IR codearrayVar
: represent a 1-dimension array variablefuncObject
: represent a function statement of the LISP programming language
- The first variables in the parser code are used respectively to support the writing of the code in the output file (global variables in LLVM need to be written at the beginning of the code with a sequential order for labels independent from the main code), organize labels for LLVM IR main instructions and memorized variables
loopList
is a list ofloopVar
objects. It is taken always the last element of this list. In this way in the case there are nested loops, the code written will be always referenced to the last loop and label can not be confused.addCode
is a very simple but essential method. It adds the code to the buffer variables (on irCode if writeOnGlobal is false otherwise on globalCode)- To every object is associated a label that will be used in the LLVM code to uniquely identify the data stored
Grammar start
prog
is the very beginning of the grammar and the last symbol to be reduced. Moreover the reducing of this symbol means that no semantic errors where found in the input file.- The first rows written on the output file are necessary to support the output operations on video. @.str.0, @.str.1, @.str.2 and @.str.3 allow to print integer and double variables on video with a newLine (\\0A) or in the same line with a space (\\20) as separator.
- If some strings are declared on the program they are added to the buffer in this first global part
- The last part of code to be written on the file is the main part of the program. In the LLVM code this instructions are inserted in the @main function
lists
andlist
non terminal symbols highlight the structure of a LISP program: essentially a sequence of lists. A list, as in many thers programming languages, is an element inserted between the terminal symbols ( and ). In the next paragraphs some of the main instructions of the LISP are explained. All instructions, functions and variables in LISP are treated as lists.
start with prog; prog ::= lists {: file_out.write("declare i32 @printf(i8*, ...)\n"); file_out.write("@.str.0 = private constant [4 x i8] c"+new String("\"%f\\0A\\00\"")+", align 1\n"); file_out.write("@.str.1 = private constant [4 x i8] c"+new String("\"%d\\0A\\00\"")+", align 1\n"); file_out.write("@.str.2 = private constant [4 x i8] c"+new String("\"%d\\20\\00\"")+", align 1\n"); file_out.write("@.str.3 = private constant [4 x i8] c"+new String("\"%f\\20\\00\"")+", align 1\n"); //Adding strings as global variables at the beginning of the LLVM program writeOnGlobal = true; for(varObject var : vars.values()){ if(var.align != null){ if(var.align.equals("1")){ var.stringValue = var.stringValue.replaceAll("\"", ""); addCode("@.str."+var.label+" = constant ["+new Integer(var.stringValue.length()+2).toString()+" x i8] c\""+var.stringValue+"\\0A\\00\", align 1\n"); } } } writeOnGlobal = false; //Writing global code file_out.write(globalCode.toString()); //Writing ir code inside the main function file_out.write("define void @main(){\n"); file_out.write(irCode.toString()); file_out.write("ret void\n}\n"); file_out.flush(); file_out.close(); System.out.println("Program terminated. File correclty written!"); :}; lists ::= lists list | list; list ::= RO instructions RC | RO glob_vars RC | RO if_stmt RC | RO loop_stmt RC | RO loop_for RC | RO num_list RC | expression | func;
Global variables definition
SETQ instruction
DEFVAR
and SETQ
are the most important instructions in LISP to initialize global variables. In the following will be provided an example with only SETQ
because the implementation in the parser of DEFVAR
is very similar.
global_vars ::= SETQ ID:id RO MAKEARRAY INT:i ELEMENTTYPE NOTEVAL:typ RC{: //If variable does not exist create it if(parser.vars.get(id) == null){ //Inserting variable into map arrayVar var = new arrayVar(i, false); String type = typ.replace("'", ""); if(type.equals("double-float")){ var.allocate("double"); } else if(type.equals("integer")){ var.allocate("i32"); } else{ pSemError("Invalid type!"); } var.global = true; parser.vars.put(id, var); } else{ //else update it arrayVar var = new arrayVar(i, false); String type = typ.replace("'", ""); if(type.equals("double-float")){ var.allocate("double"); } else if(type.equals("integer")){ var.allocate("i32"); } else{ pSemError("Invalid type!"); } var.label = parser.vars.get(id).label; //Keeping old label parser.vars.put(id, var); } RESULT = parser.vars.get(id).label; :} | SETQ ID:id DOUBLE:num {: //If variable does not exist create it if(parser.vars.get(id) == null){ //Inserting variable into map doubleVar var = new doubleVar(num, "8", true); var.global = true; parser.vars.put(id, var); //Adding code to LLVM addCode("%"+var.label+" = alloca double, align 8\n"); addCode("store double "+var.stringValue+", double* %"+var.label+"\n"); } else{ //else update it doubleVar var = new doubleVar(num, "8", false); var.label = parser.vars.get(id).label; //Keeping old label parser.vars.put(id, var); //Adding code to LLVM - updating value addCode("store double "+var.stringValue+", double* %"+var.label+"\n"); } RESULT = num; :} | SETQ ID:id INT:num {: //If variable does not exist create it if(parser.vars.get(id) == null){ //Inserting variable into map intVar var = new intVar(num, "4", true); var.global = true; parser.vars.put(id, var); //Adding code to LLVM addCode("%"+var.label+" = alloca i32, align 4\n"); addCode("store i32 "+var.stringValue+", i32* %"+var.label+"\n"); } else{ //else update it intVar var = new intVar(num, "4", false); var.label = parser.vars.get(id).label; //Keeping old label parser.vars.put(id, var); //Adding code to LLVM - updating value addCode("store i32 "+var.stringValue+", i32* %"+var.label+"\n"); } RESULT = num; :}
- With the
SETQ
if the variable does not exist yet a new one is created. If it exists a newintVar
,arrayVar
ordoubleVar
object is created but the old label is maintaned.
With the following LISP code:
(setq a (make-array 3 :element-type 'integer)) (defvar b 4.0) (setq b 3.0)
The following LLVM Code:
declare i32 @printf(i8*, ...) @.str.0 = private constant [4 x i8] c"%f\0A\00", align 1 @.str.1 = private constant [4 x i8] c"%d\0A\00", align 1 @.str.2 = private constant [4 x i8] c"%d\20\00", align 1 @.str.3 = private constant [4 x i8] c"%f\20\00", align 1 define void @main(){ %1 = alloca [3 x i32], align 4 %2 = alloca double, align 8 store double 4.0, double* %2 store double 3.0, double* %2 ret void }
Nested expressions
LISP uses prefix notation so, operators are written before their operands. For example, the expression:
a * ( b + c ) / d
in LISP is written as:
(/ (* a (+ b c) ) d)
Implementation
Let's consider the following expression:
(setq a (+ 1 (- 3 4) (logand 3 (+ 2 3 4)) (- 4 5)))
A bottom-up approach
The strategy used to evaluate this instruction is to start from the most nested expressions (like (+ 2 3 4)). When an expression
is reduced, in the RESULT variable is memorized the label of the LLVM instruction where the result value is stored. In this way, the upper level expression
is able to access the result and treating it as operand to calculate it's own result value and so the process repeats. If just one of the operands of an expression is a double all int operands are converted. For this purpose the sitofp
LLVM instruction is used.
expression ::= RO operator:x num_list:nl RC {: boolean doubleFlag = false; String previousLabel = ""; String localLabel = ""; String[] operands = new String[2]; String[] tempOperands = new String[2]; //Needed to manage conversions of operands for division sdiv Integer count = 0; String tempLabelConversion = ""; //Check if at least 1 element is double //in that case all other values need to //be converted if are int for(String s: nl){ if(vars.get(s) instanceof doubleVar){ doubleFlag = true; } } if(x.equals("and") || x.equals("or") || x.equals("xor")){ //with logical operators operands can be only integer if(doubleFlag){ pSemError("Logical operations require integer numbers"); } } ListIterator<String> litr = nl.listIterator(nl.size()); if(doubleFlag){ while(litr.hasPrevious()){ String s = litr.previous(); if(previousLabel.equals("")){ if(vars.get(s) instanceof intVar){ if(vars.get(s).label != null){ //It's an int stored variable localLabel = getVarLabel(); addCode("%"+localLabel+" = load i32, i32* %"+vars.get(s).label+", align 4\n"); localLabel = getVarLabel(); addCode("%"+localLabel+" = sitofp i32 %"+localLabel+" to double\n"); operands[count++] = "%"+localLabel; } else{ tempLabelConversion = getVarLabel(); addCode("%"+tempLabelConversion+" = sitofp i32 " + s + " to double\n" ); operands[count++] = "%"+tempLabelConversion; } } else{ if(vars.get(s).label != null){ //It's a double stored variable localLabel = getVarLabel(); addCode("%"+localLabel+" = load double, double* %"+vars.get(s).label+", align 8\n"); operands[count++] = "%"+localLabel; }else{ operands[count++]= s; } } if(count == 2){ previousLabel = "%"+getVarLabel(); if(x.equals("fdiv")){ addCode(previousLabel+" = fdiv double "+operands[0]+", "+operands[1]+"\n"); } else{ addCode(previousLabel+" = f"+x+" double "+operands[0]+", "+operands[1]+"\n"); } count=0; } } else{ operands[count++] = previousLabel; if(vars.get(s) instanceof intVar){ tempLabelConversion = getVarLabel(); addCode("%"+tempLabelConversion+" = sitofp i32 " + s + " to double\n" ); operands[count++] = "%"+tempLabelConversion; } else{ operands[count++]= s; } previousLabel = "%"+getVarLabel(); if(x.equals("fdiv")){ addCode(previousLabel+" = fdiv double "+operands[0]+", "+operands[1]+"\n"); } else{ addCode(previousLabel+" = f"+x+" double "+operands[0]+", "+operands[1]+"\n"); } count=0; } } vars.put(previousLabel, new doubleVar()); } else{ while(litr.hasPrevious()){ String s = litr.previous(); if(previousLabel.equals("")){ if(vars.get(s).label != null){ //It's an int stored variable localLabel = getVarLabel(); addCode("%"+localLabel+" = load i32, i32* %"+vars.get(s).label+", align 4\n"); operands[count++] = "%"+localLabel; } else{ operands[count++]= s; } if(count == 2){ //A division implies a floating value if(x.equals("fdiv")){ tempOperands[0] = "%"+getVarLabel(); tempOperands[1] = "%"+getVarLabel(); addCode(tempOperands[0]+" = sitofp i32 "+operands[0]+" to double\n"); addCode(tempOperands[1]+" = sitofp i32 "+operands[1]+" to double\n"); previousLabel = "%"+getVarLabel(); addCode(previousLabel+" = fdiv double "+tempOperands[0]+", "+tempOperands[1]+"\n"); } else{ previousLabel = "%"+getVarLabel(); addCode(previousLabel+" = "+x+" i32 "+operands[0]+", "+operands[1]+"\n"); } count=0; } } else{ operands[count++] = previousLabel; if(vars.get(s).label != null){ //It's an int stored variable localLabel = getVarLabel(); addCode("%"+localLabel+" = load i32, i32* %"+vars.get(s).label+", align 4\n"); operands[count++] = "%"+localLabel; } else{ operands[count++]= s; } if(x.equals("fdiv")){ tempOperands[0] = "%"+getVarLabel(); tempOperands[1] = "%"+getVarLabel(); addCode(tempOperands[0]+" = sitofp i32 "+operands[0]+" to double\n"); addCode(tempOperands[1]+" = sitofp i32 "+operands[1]+" to double\n"); previousLabel = "%"+getVarLabel(); addCode(previousLabel+" = fdiv double "+tempOperands[0]+", "+tempOperands[1]+"\n"); } else{ previousLabel = "%"+getVarLabel(); addCode(previousLabel+" = "+x+" i32 "+operands[0]+", "+operands[1]+"\n"); } count=0; } } //A division implies a floating value if(x.equals("fdiv")){ vars.put(previousLabel, new doubleVar()); } else{ vars.put(previousLabel, new intVar()); } } RESULT = new String(previousLabel); :};
The resulting LLVM code is:
declare i32 @printf(i8*, ...) @.str.0 = private constant [4 x i8] c"%f\0A\00", align 1 @.str.1 = private constant [4 x i8] c"%d\0A\00", align 1 @.str.2 = private constant [4 x i8] c"%d\20\00", align 1 @.str.3 = private constant [4 x i8] c"%f\20\00", align 1 define void @main(){ %1 = add i32 0,4 %2 = add i32 0,3 %3 = sub i32 %2, %1 %4 = add i32 0,4 %5 = add i32 0,3 %6 = add i32 0,2 %7 = add i32 %6, %5 %8 = add i32 %7, %4 %9 = add i32 0,3 %10 = and i32 %9, %8 %11 = add i32 0,5 %12 = add i32 0,4 %13 = sub i32 %12, %11 %14 = add i32 0,1 %15 = add i32 %14, %3 %16 = add i32 %15, %10 %17 = add i32 %16, %13 %18 = alloca i32, align 4 store i32 %17, i32* %18 %19 = load i32, i32* %18, align 4 %20 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.1, i32 0, i32 0), i32 %19) ret void }
Functions
Implementation
//Functions func ::= RO DEFUN ID:str RO num_list:nl NT2 RC lists RC {: funcObject func = new funcObject(str, nl.size()); func.returnType = "void"; //End of the function addCode("ret void \n}\n"); functionsMap.put(str, func); writeOnGlobal = false; varLabelCounter = tempGLobal; :}; NT2 ::= {: String name = (String) stack(-2); String arguments = new String(""); String label = ""; Integer functionCount = 3; //Functions' body in LLVM must declared on the global part writeOnGlobal = true; LinkedList<String> nl = (LinkedList<String>) stack(0); //Counting the arguments for(String s: nl){ arguments += "i32,"; vars.put(s, new intVar()); } StringBuilder tmp = new StringBuilder(arguments); tmp.setCharAt(arguments.length()-1, ')'); arguments = tmp.toString(); addCode("define void @"+ name+"("+arguments+"{\n"); int i = 0; //For each argument store a variable for(String s : nl){ label = (functionCount++).toString(); vars.get(s).label = label; addCode("%"+label+ " = alloca i32, align 4\n"); addCode("store i32 %"+ i+", i32* %"+label+"\n"); i++; } //LLVM Label counting inside functions is //indipendent from the main part tempGLobal = varLabelCounter; varLabelCounter = functionCount; :};
LISP functions with only integer parameters and void return type are supported.
- Building of functions in LLVM is supported by the use of the marker
NT2
. When theRO DEFUN ID:str RO num_list:nl
symbols are recognized, corresponding for example to(defun average(1, 2
it is quite clear that a new function has to be created, so the number of arguments is counted and the corresponding
declare
LLVM instruction is written on the code - Attention: LLVM Label counting inside functions is indipendent from the main part so the
varLabelCounter
is temporary saved and initialized with the function counter value - When also the body of the function is reduced the
funcObject
is created and the final LLVM code is written to end the function
The corresponding LISP code:
(defun division(n1 n2) (if (= n1 0) (print err)) (setq res (/ n2 n1)) (print res) )
is translated in the following LLVM code (simplified):
define void @division(i32,i32){ %3 = alloca i32, align 4 store i32 %0, i32* %3 %4 = alloca i32, align 4 store i32 %1, i32* %4 ... ... ... %16 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.0, i32 0, i32 0), double %15) ret void }
Print instruction
The following variants of the PRINT instruction are supported:
print a
- printing of declared variables (strings, integers, doubles)print arr
- printing of array variablesprint (+ 2 3 4)
- printing of direct expressionsprint (aref my-array 1)
- printing of array elements
For semplicity only the parser code for declared variables is provided:
instructions ::= PRINT ID:id {: if(vars.get(id) != null){ //Printing a string if(vars.get(id) instanceof stringVar){ stringVar var = (stringVar) vars.get(id); String length = new Integer(var.stringValue.length()).toString(); addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds (["+length+" x i8], ["+length+" x i8]* @.str."+var.label+", i32 0, i32 0))\n"); } //Printing a double else if(vars.get(id) instanceof doubleVar){ doubleVar var = (doubleVar) vars.get(id); Integer localLabel = new Integer(getVarLabel()); addCode("%"+localLabel+" = load double, double* %"+var.label+", align 8\n"); addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.0, i32 0, i32 0), double %"+localLabel+")\n"); } //Printing an integer else if(vars.get(id) instanceof intVar){ intVar var = (intVar) vars.get(id); Integer localLabel = new Integer(getVarLabel()); addCode("%"+localLabel+" = load i32, i32* %"+var.label+", align 4\n"); addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.1, i32 0, i32 0), i32 %"+localLabel+")\n"); } //Printing an array else if(vars.get(id) instanceof arrayVar){ arrayVar var = (arrayVar) vars.get(id); String label = ""; for(int i=0; i< var.size; i++){ label = getVarLabel(); addCode("%"+label+" = getelementptr inbounds ["+var.size+" x i32] , ["+var.size+" x i32]* %"+var.label+", i32 0, i32 "+i+"\n"); if(var.type.equals("i32")){ Integer localLabel = new Integer(getVarLabel()); addCode("%"+localLabel+" = load i32, i32* %"+label+", align 4\n"); if(i != var.size-1){ addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.2, i32 0, i32 0), i32 %"+localLabel+")\n"); } else{ addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.1, i32 0, i32 0), i32 %"+localLabel+")\n"); } } else{ Integer localLabel = new Integer(getVarLabel()); addCode("%"+localLabel+" = load double, double* %"+label+", align 8\n"); if(i != var.size-1){ addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.3, i32 0, i32 0), double %"+localLabel+")\n"); } else{ addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.0, i32 0, i32 0), double %"+localLabel+")\n"); } } } } } else{ pSemError("Variable not declared!"); } :}
- Strings: as explained in the paragraph Grammar start, strings found in the LISP code must be declared as global in the LLVM IR code. In this way, for printing the string it is only necessary to recover the label form the
stringVar
object and add the@printf
instruction to the LLVM code passing to this one the label (stored inlocalLabel
) - Doubles and Integers: declared variables are stored in the memory.
intVar
anddoubleVar
objects store the LLVM label of typedouble*
andi32*
that locate the memory area where the value is stored. With theload
instruction the value is red from the memory and then passed to the@printf
instruction - Array: if the variable is an array the first thing to do is to extract the pointer to the memory location of the array at a desired offset. For this purpose, the
getelementptr
LLVM instruction is used. Because we want to print all the array (to print a specific cell AREF is used) the variable i is used to iterate over the positions from 0 to the size of the array minus 1 and passed to the instruction
For instruction
In LISP the classical for statement present in other programming languages is implemented with the loop
or loop for
instructions. Here the parser implementation of the loop for
:
loop_for ::= LOOP FOR ID:i FROM INT:a TO INT:b DO {: String oldLabel = ""; String newLabel = ""; String cmpLabel = ""; String localLabel = ""; String localLabel1 = ""; loopList.addLast(new loopVar("for")); loopVar lv = loopList.getLast(); //Variable i is yet used in the code as global //Now becomes local if(vars.get(i) != null){ oldLabel = vars.get(i).label; newLabel = getVarLabel(); vars.get(i).label = newLabel; //To avoid use the global variable we need to use a new Label and store the old one addCode("%"+newLabel+" = alloca i32, align 4\n"); addCode("store i32 "+(a-1)+", i32* %"+newLabel+"\n"); } else{ intVar var = new intVar(a, "4", true); vars.put(i, var); newLabel = vars.get(i).label; addCode("%"+newLabel+" = alloca i32, align 4\n"); addCode("store i32 "+(a-1)+", i32* %"+newLabel+"\n"); } addCode("br label %for.body."+lv.label+"\n"); addCode("for.body."+lv.label+":\n"); localLabel = getVarLabel(); addCode("%"+localLabel+" = load i32, i32* %"+newLabel+", align 4\n"); localLabel1 = getVarLabel(); addCode("%"+localLabel1+" = add i32 %"+localLabel+", 1\n"); addCode("store i32 %"+localLabel1+", i32* %"+newLabel+", align 4\n"); localLabel = getVarLabel(); addCode("%"+localLabel+" = load i32, i32* %"+newLabel+", align 4\n"); cmpLabel = getVarLabel(); addCode("%"+cmpLabel+" = icmp slt i32 %"+localLabel+","+(b+1)+"\n"); addCode("br i1 %"+cmpLabel+", label %for.body2."+lv.label+", label %for.exit."+lv.label+"\n"); addCode("for.body2."+lv.label+":\n"); RESULT = oldLabel; :} lists{: loopVar lv = loopList.getLast(); addCode("br label %for.body."+lv.label+"\n"); addCode("for.exit."+lv.label+":\n"); String oldLabel = (String) stack(-1); //If oldLabel == "" means that ID:i was only a local vaiable in the loop, remove it if(oldLabel.equals("")){ vars.remove(i); } //else means that it had an oldLabel so was used as global -> restore the oldLabel else{ vars.get(i).label = oldLabel; } :};
loopList.addLast
add the current loop to the loop list. This allows us to 'select' the last loop inserted and correctly add to the code the labels. It is necessary when there are nested loops: we need to select always the last inserted.- It is necessary to see if the variable used for the loop is already declared outside. In this case the variable must be substituted temporary because by the end of the loop statement it assumes a local scoping
- With the code
“for.body.”+lv.label+“:\n”
we enter in the loop body. The first thing to do is to load the loop variable, adding to it 1 and storing it. Then it is compared with the second INT to eventually terminate the loop
Error handling
The compiler is able to recognize the following kind of errors:
- Variable not declared
- Variable is not an array
- Index out of range in arrays
- Use of non integer values in logical operations
- Assignent of a wrong type value to an array
Missing functionalities
- Functions support only numerical values as input and no return type
- Condition operators support only 2 operands
- Inside functions and statements is supported the local scoping of variables. The local variables of LISP defined with let instruction are not supported
- Array support only integer and double variables
- Float values are not supported
- In
loop for
andmake-array
are supported only direct numerical values, not variables
A simple example
An example of a LISP program used to calculate the first 22 numbers of the Fibonacci serie:
- fibo.lisp
(defvar a 0) (defvar b 1) (loop for i from 0 to 10 do (write a) (if (= i 10) (print b) (write b)) (setq a (+ a b)) (setq b (+ a b)) )
Execution of program will produce the following output:
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946
Download and installation
Compiler: lisp_compiler
Examples
- Fibonacci: fibonacci
- Division: division
- Bubble sort: bubble_sort
- Floyd triangle: triangle
Each example is provided separately and contains the LISP source code and the corresponding output in LLVM IR, generated by the compiler. For sure, each source code can be given as input of the compiler, which will produce the same output contained in the example. In general after
How to run it
- Install, if you have not already done so, the llvm package:
sudo apt install llvm
- Download the lisp compiler and extract it
- Open the terminal, go to the folder where the compiler is extracted.
- [OPTION 1 - Compiling a generic LISP source file] In general the following list of commands must be executed:
jflex lisp_scanner.jflex
java java_cup.Main lisp_parser.cup
javac *.java
java Main source_file.lisp
- [OPTION 2 - Compiling provided examples]:
- To run the bubble sort example run
make bubble
- To run the floyd triangle example run
make triagle
- To run the Fibonacci example run
make fibo
- To run the division example run
make division
- Default
make
command will run bubble sort example
- Run output.ll file with lli:
lli output.ll
If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it
You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/lisp_to_llvm