User Tools

Site Tools


compilers:lisp_to_llvm
Return to Home page

From Lisp to LLVM

Background

This page shows the implementation of a compiler which recognizes and translates part of the LISP programming language into the LLVM IR syntax (more information about LLVM can be found here).

The LLVM Intermediate Representation (IR) is placed between the source code in a given programming language and the machine code for a specific architecture. It is independent from both the programming language and the architecture.

A brief introduction

It is important to know that LISP is a family of programming languages born in the early 60s and developed to date. The most common LISP dialect is the Common LISP, which is also the one studied below.

Some useful resources:

Tutorials:

An interesting article:

Implemented parts

List of LISP parts supported by the compiler:

  • Data types:
    • Basic data types:
      • int
      • double
      • strings
    • Aggregate data types:
      • array (1-dimension) - only double and integers
    • Reference data types:
      • functions (only integer numerical parameters, no return type)
  • Variables:
    • global variables
    • local scoping of variables inside functions
  • Operators:
    • arithmetic operators:
      • sum (+)
      • difference (-)
      • multiplication (*)
      • division (/)
    • relational operators:
      • equal(=)
      • not equal (/=)
      • greater than (>)
      • less than (<)
      • greater than equal to (>=)
      • less than equal to (<=)
    • bitwise operators:
      • AND (LOGAND)
      • OR (LOGIOR)
      • XOR (LOGXOR)
    • assignments:
      • DEFVAR
      • SETQ
  • Control statements:
    • decision making:
      • if (true condition)
      • if (true condition) (false condition)
      • if (false condition)
    • loops:
      • loop statement
      • loop for statement
  • Useful instructions:
    • SETF
    • DEFUN
    • AREF
  • Output:
    • PRINT
    • WRITE

Compiler

The compiler is made up of two parts: a scanner and a parser.

Scanner

The Scanner, created using jflex, scans an input file and generates a specific token each time that a stream of characters matches with a given rule (that can be a regular expression). In that case, it recognizes all useful symbols of a LISP source code, generating tokens which are later used by the Parser. Symbols cover most important keywords of the LISP language.

Snippet of LISP Scanner

  1. nl = \r|\n|\r\n
  2. ws = [ \t]
  3. string = (\"[^0-9]*\")
  4. id = ([0-9]*[a-zA-Z\?\.\%&!\^\$£\+_\/\<\>\*\\\[\]@#\-]+[a-zA-Z0-9\?\.\%&!\^\$£\+_\/\<\>\*\\\[\]\-@#]*)
  5. integer = ([\+\-]*[1-9][0-9]*|0)
  6. double = (([0-9]+\.[0-9]*) | ([0-9]*\.[0-9]+)) (e|E('+'|'-')?[0-9]+)?
  7. not_evaluate = ('[a-zA-Z\-]+)
  8.  
  9. %%
  10. "(" {return symbol(sym.RO);}
  11. ")" {return symbol(sym.RC);}
  12. "{" {return symbol(sym.BO);}
  13. "}" {return symbol(sym.BC);}
  14. "=" {return symbol(sym.EQ);}
  15. "+" {return symbol(sym.PLUS);}
  16. "-" {return symbol(sym.MINUS);}
  17. "*" {return symbol(sym.STAR);}
  18. "/" {return symbol(sym.DIV);}
  19. "<" {return symbol(sym.MIN);}
  20. ">" {return symbol(sym.MAJ);}
  21. "<=" {return symbol(sym.MIN_EQ);}
  22. "=<" {return symbol(sym.EQ_MIN);}
  23. ">=" {return symbol(sym.MAJ_EQ);}
  24. "=>" {return symbol(sym.EQ_MAJ);}
  25. "'" {return symbol(sym.APICE);}
  26.  
  27. "[" {return symbol(sym.SO);}
  28. "]" {return symbol(sym.SC);}
  29.  
  30. "int" {return symbol(sym.INT_TYPE);}
  31. "double" {return symbol(sym.DOUBLE_TYPE);}
  32.  
  33. //Array
  34. make-array {return symbol(sym.MAKEARRAY);}
  35. :element-type {return symbol(sym.ELEMENTTYPE);}
  36. aref {return symbol(sym.AREF);}
  37.  
  38. //Logical operators
  39. logand {return symbol(sym.LOGAND);}
  40. logior {return symbol(sym.LOGIOR);}
  41. logxor {return symbol(sym.LOGXOR);}
  42.  
  43. //Output
  44. print {return symbol(sym.PRINT);}
  45. write {return symbol(sym.WRITE);}
  46.  
  47. //Statements
  48. if {return symbol(sym.IF);}
  49. loop {return symbol(sym.LOOP);}
  50. for {return symbol(sym.FOR);}
  51. from {return symbol(sym.FROM);}
  52. to {return symbol(sym.TO);}
  53. do {return symbol(sym.DO);}
  54. in {return symbol(sym.IN);}
  55. when {return symbol(sym.WHEN);}
  56. then {return symbol(sym.THEN);}
  57. return-from {return symbol(sym.RETURNFROM);}
  58. return {return symbol(sym.RETURN);}
  59.  
  60. //Functions
  61. defvar {return symbol(sym.DEFVAR);}
  62. defun {return symbol(sym.DEFUN);}
  63. setf {return symbol(sym.SETF);}
  64. setq {return symbol(sym.SETQ);}
  65. list {return symbol(sym.LIST);}
  66.  
  67. , {return symbol(sym.CM);}
  68. {integer} {return symbol(sym.INT, new Integer(yytext())); }
  69. {double} {return symbol(sym.DOUBLE, new Double(yytext()));}
  70. {not_evaluate} {return symbol(sym.NOTEVAL, new String(yytext()));}
  71. {id} {return symbol(sym.ID, new String(yytext()));}
  72. {string} {return symbol(sym.STRING, new String(yytext()));}
  73.  
  74. ";".* {;}
  75.  
  76.  
  77. {ws}|{nl} {;}

Parser

The parser is created using CUP. It takes as input the tokens provided by the scanner and recognize specific sequences of symbols. Every time a sequence is reduced, rule is executed. With the support of Java code for rules, the corrisponding LLVM IR code is produced.

Structure:

non_terminal ::= SYMBOLS {: /* rule code */:};

Data structures

In the parser code is contained the Java code used to support the parser in memorizing variables and assigning the labels needed to the LLVM IR instructions.

    //code buffer for llvm
    public static StringBuffer irCode = new StringBuffer();
 
    //global code buffer for llvm
    //it is used to declare global variables like string 
    //at the beginning of the program
    public static StringBuffer globalCode = new StringBuffer();          
 
    public BufferedWriter file_out;
 
    //flag used by addCode to understand in
    //what buffer code must be written 
    public static boolean writeOnGlobal = false;                         
 
    //used for local scoping inside functions                                                                 
    public static Integer tempGLobal = 0;                                
 
    //labelCounter for assigning registers
    public static Integer varLabelCounter = 1; 
 
    //labelCounter for global variables                          
    public static Integer globalVarLabelCounter = 4;  
 
    //labelCounter for decision statements                   
    public static Integer labelDecision = 0;                             
 
    public HashMap <String, varObject> vars;
    public HashMap<String, funcObject> functionsMap;
    public LinkedList<loopVar> loopList;
 
    /*
        addCode()
        add a string to the buffer code
    */
    public void addCode(String var){
        if(!writeOnGlobal){
            irCode.append(var);
        }
        else{
            globalCode.append(var);
        }
    }
 
    /*
        getVarLabel()
        Returns a unique label for a variable
    */
    public String getVarLabel(){
        return (varLabelCounter++).toString();
    }
 
    /*
        getGlobalVarLabel()
        Returns a unique label for a global variable
    */
    public String getGlobalVarLabel(){
        return (globalVarLabelCounter++).toString();
    }
 
    /*
        getDecisionLabel()
        Returns a unique label for a decision statement
    */
    public String getDecisionLabel(){
        return (labelDecision++).toString();
    }
 
    public class varObject{
        public String label;
        public String align;
        public String stringValue;
        public boolean global;
 
        /* 
           void constructor is used to initialize 
           values directly written in the code
        */
        public varObject(){
            this.label = null;
            this.stringValue = null;
            this.align = null;
            this.global = false;                            //All variables are declared at beginning as local
        }
 
        public varObject(String align, boolean setLabel){
            this.align = align;
            this.global = false;
            if(setLabel){
 
                if(align.equals("1")){                      //If align == 1 is a string so it's global
                    this.label = getGlobalVarLabel();
                }
                else{
                    this.label = getVarLabel();
                }
            }
        }
 
    }
 
    public class intVar extends varObject{
        public Integer intValue;
 
        public intVar(){
            this.intValue = null;
        }
 
        public intVar(Integer value, String align, boolean setLabel){
            super(align, setLabel);
            this.intValue = value;
            this.stringValue = value.toString();
        }
    }
 
    public class doubleVar extends varObject{
        public Double doubleValue;
 
        public doubleVar(){
            this.doubleValue = null;
        }
 
        public doubleVar(Double value, String align, boolean setLabel){
            super(align, setLabel);
            this.doubleValue = value;
            this.stringValue = value.toString();
        }
    }
 
    public class stringVar extends varObject{
        public stringVar(String value, boolean setLabel){
            super("1", setLabel);
            this.stringValue = value;
        }
    }
 
    public class loopVar extends varObject{
        public loopVar(String name){
            super();
            this.stringValue = name;
            this.label = getDecisionLabel();
        }
    }
 
    public class arrayVar extends varObject{
        public Integer size;
        public String type;
        public arrayVar(Integer size, boolean setLabel){
            super("", setLabel);
            this.size = size;
        }
 
        public void allocate(String type){
            this.type = type;
            this.label = getVarLabel();
            if(this.type.equals("double")){
                this.align = "8";
                addCode("%"+this.label+" = alloca ["+this.size+" x double], align 8\n");
            }
            else if(this.type.equals("i32")){
                this.align = "4";
                addCode("%"+this.label+" = alloca ["+this.size+" x i32], align 4\n");
            }
        }
    }
 
    public class funcObject{
        public String name;
        public String returnType;
        public Integer size;
        public String[] arguments;
 
 
        public funcObject(String name, Integer size){
            this.name = name;
            this.size = size;
            this.arguments = new String[size];
        }
 
    }
 
 
  • varObject: is a Java class used to model a generic object of the language Every object is used to memorize useful informations (like the Label) to write the correct LLVM IR code. On the following, the specializing classes:
    • intVar: represents the Integer variables
    • doubleVar: represents the Double variables
    • stringVar: represents the String variables
    • loopVar: represent a loop or loop for statement of the LISP programming languages. In this objects are stored labels used to perform correctly jumps in the LLVM IR code
    • arrayVar: represent a 1-dimension array variable
    • funcObject: represent a function statement of the LISP programming language
  • The first variables in the parser code are used respectively to support the writing of the code in the output file (global variables in LLVM need to be written at the beginning of the code with a sequential order for labels independent from the main code), organize labels for LLVM IR main instructions and memorized variables
    • loopList is a list of loopVar objects. It is taken always the last element of this list. In this way in the case there are nested loops, the code written will be always referenced to the last loop and label can not be confused.
    • addCode is a very simple but essential method. It adds the code to the buffer variables (on irCode if writeOnGlobal is false otherwise on globalCode)
    • To every object is associated a label that will be used in the LLVM code to uniquely identify the data stored

Grammar start

  • prog is the very beginning of the grammar and the last symbol to be reduced. Moreover the reducing of this symbol means that no semantic errors where found in the input file.
    • The first rows written on the output file are necessary to support the output operations on video. @.str.0, @.str.1, @.str.2 and @.str.3 allow to print integer and double variables on video with a newLine (\\0A) or in the same line with a space (\\20) as separator.
    • If some strings are declared on the program they are added to the buffer in this first global part
    • The last part of code to be written on the file is the main part of the program. In the LLVM code this instructions are inserted in the @main function
  • lists and list non terminal symbols highlight the structure of a LISP program: essentially a sequence of lists. A list, as in many thers programming languages, is an element inserted between the terminal symbols ( and ). In the next paragraphs some of the main instructions of the LISP are explained. All instructions, functions and variables in LISP are treated as lists.
start with prog;
prog ::= lists
            {:
                file_out.write("declare i32 @printf(i8*, ...)\n");
                file_out.write("@.str.0 = private constant [4 x i8] c"+new String("\"%f\\0A\\00\"")+", align 1\n");
                file_out.write("@.str.1 = private constant [4 x i8] c"+new String("\"%d\\0A\\00\"")+", align 1\n");
                file_out.write("@.str.2 = private constant [4 x i8] c"+new String("\"%d\\20\\00\"")+", align 1\n");
                file_out.write("@.str.3 = private constant [4 x i8] c"+new String("\"%f\\20\\00\"")+", align 1\n");
 
 
                //Adding strings as global variables at the beginning of the LLVM program
                writeOnGlobal = true;
                for(varObject var : vars.values()){
                    if(var.align != null){
                        if(var.align.equals("1")){
                            var.stringValue = var.stringValue.replaceAll("\"", "");
                            addCode("@.str."+var.label+" = constant ["+new Integer(var.stringValue.length()+2).toString()+" x i8] c\""+var.stringValue+"\\0A\\00\", align 1\n");
                        }
                    }
                }
                writeOnGlobal = false;
 
                //Writing global code
                file_out.write(globalCode.toString());
 
                //Writing ir code inside the main function
                file_out.write("define void @main(){\n");
                file_out.write(irCode.toString());
                file_out.write("ret void\n}\n");
                file_out.flush();
                file_out.close();
                System.out.println("Program terminated. File correclty written!");
 
            :};
 
lists ::= lists list | list;
 
list ::= RO instructions RC
        | RO  glob_vars RC 
        | RO if_stmt RC
        | RO loop_stmt RC
        | RO loop_for RC
        | RO num_list RC
        | expression
        | func;

Global variables definition

SETQ instruction

DEFVAR and SETQ are the most important instructions in LISP to initialize global variables. In the following will be provided an example with only SETQ because the implementation in the parser of DEFVAR is very similar.

global_vars ::=  SETQ ID:id RO MAKEARRAY INT:i ELEMENTTYPE NOTEVAL:typ RC{:
            //If variable does not exist create it
            if(parser.vars.get(id) == null){        
                //Inserting variable into map
                arrayVar var = new arrayVar(i, false);
                String type = typ.replace("'", "");
                if(type.equals("double-float")){
                    var.allocate("double");
                }
                else if(type.equals("integer")){
                    var.allocate("i32");
                }
                else{
                    pSemError("Invalid type!");
                }
                var.global = true;
                parser.vars.put(id, var);
            }
            else{
                //else update it
 
                arrayVar var = new arrayVar(i, false);
                String type = typ.replace("'", "");
                if(type.equals("double-float")){
                    var.allocate("double");
                }
                else if(type.equals("integer")){
                    var.allocate("i32");
                }
                else{
                    pSemError("Invalid type!");
                }
                var.label = parser.vars.get(id).label;      //Keeping old label
                parser.vars.put(id, var);
 
            }
            RESULT = parser.vars.get(id).label;
 
        :}
       |  SETQ ID:id DOUBLE:num  {:
            //If variable does not exist create it
            if(parser.vars.get(id) == null){        
                //Inserting variable into map
                doubleVar var = new doubleVar(num, "8", true);
                var.global = true;
                parser.vars.put(id, var);
 
                //Adding code to LLVM
                addCode("%"+var.label+" = alloca double, align 8\n");
                addCode("store double "+var.stringValue+", double* %"+var.label+"\n");
            }
            else{
                //else update it
 
                doubleVar var = new doubleVar(num, "8", false);
                var.label = parser.vars.get(id).label;      //Keeping old label
                parser.vars.put(id, var);
 
                //Adding code to LLVM - updating value
                addCode("store double "+var.stringValue+", double* %"+var.label+"\n");
            }
            RESULT = num;
        :}
       |  SETQ ID:id INT:num  {:
            //If variable does not exist create it
            if(parser.vars.get(id) == null){        
                //Inserting variable into map
                intVar var = new intVar(num, "4", true);
                var.global = true;
                parser.vars.put(id, var);
 
                //Adding code to LLVM
                addCode("%"+var.label+" = alloca i32, align 4\n");
                addCode("store i32 "+var.stringValue+", i32* %"+var.label+"\n");
            }
            else{
                //else update it
 
                intVar var = new intVar(num, "4", false);
                var.label = parser.vars.get(id).label;      //Keeping old label
                parser.vars.put(id, var);
 
 
                //Adding code to LLVM - updating value
                addCode("store i32 "+var.stringValue+", i32* %"+var.label+"\n");
            }
            RESULT = num;
        :}
  • With the SETQ if the variable does not exist yet a new one is created. If it exists a new intVar, arrayVar or doubleVar object is created but the old label is maintaned.

With the following LISP code:

(setq a (make-array 3 :element-type 'integer))
 
(defvar b 4.0)
(setq b 3.0)

The following LLVM Code:

declare i32 @printf(i8*, ...)
@.str.0 = private constant [4 x i8] c"%f\0A\00", align 1
@.str.1 = private constant [4 x i8] c"%d\0A\00", align 1
@.str.2 = private constant [4 x i8] c"%d\20\00", align 1
@.str.3 = private constant [4 x i8] c"%f\20\00", align 1
define void @main(){
%1 = alloca [3 x i32], align 4
%2 = alloca double, align 8
store double 4.0, double* %2
store double 3.0, double* %2
ret void
}

Nested expressions

LISP uses prefix notation so, operators are written before their operands. For example, the expression:

a * ( b + c ) / d

in LISP is written as:

(/ (* a (+ b c) ) d)

Implementation

Let's consider the following expression:

(setq a (+ 1 (- 3 4) (logand 3 (+ 2 3 4)) (- 4 5)))
A bottom-up approach

The strategy used to evaluate this instruction is to start from the most nested expressions (like (+ 2 3 4)). When an expression is reduced, in the RESULT variable is memorized the label of the LLVM instruction where the result value is stored. In this way, the upper level expression is able to access the result and treating it as operand to calculate it's own result value and so the process repeats. If just one of the operands of an expression is a double all int operands are converted. For this purpose the sitofp LLVM instruction is used.

expression ::= RO operator:x num_list:nl RC 
                {:
                    boolean doubleFlag = false;
                    String previousLabel = "";
                    String localLabel = "";
                    String[] operands = new String[2];
                    String[] tempOperands = new String[2];      //Needed to manage conversions of operands for division sdiv
                    Integer count = 0;
                    String tempLabelConversion = "";
 
                    //Check if at least 1 element is double
                    //in that case all other values need to
                    //be converted if are int
                    for(String s: nl){
                        if(vars.get(s) instanceof doubleVar){
                            doubleFlag = true;
                        }
                    }
 
                    if(x.equals("and") || x.equals("or") || x.equals("xor")){
                        //with logical operators operands can be only integer
                        if(doubleFlag){
                            pSemError("Logical operations require integer numbers");
                        }
                    }
 
                    ListIterator<String> litr = nl.listIterator(nl.size());
 
                    if(doubleFlag){
                        while(litr.hasPrevious()){
                            String s = litr.previous();
                            if(previousLabel.equals("")){
                                if(vars.get(s) instanceof intVar){
                                    if(vars.get(s).label != null){
                                        //It's an int stored variable
                                        localLabel = getVarLabel();
                                        addCode("%"+localLabel+" = load i32, i32* %"+vars.get(s).label+", align 4\n");
                                        localLabel = getVarLabel();
                                        addCode("%"+localLabel+" = sitofp i32 %"+localLabel+" to double\n");
                                        operands[count++] = "%"+localLabel;
                                    }
                                    else{
                                        tempLabelConversion = getVarLabel();
                                        addCode("%"+tempLabelConversion+" = sitofp i32 " + s + " to double\n" );
                                        operands[count++] = "%"+tempLabelConversion;        
                                    }                          
 
                                }
                                else{
                                    if(vars.get(s).label != null){
                                    //It's a double stored variable
                                    localLabel = getVarLabel();
                                    addCode("%"+localLabel+" = load double, double* %"+vars.get(s).label+", align 8\n");
                                    operands[count++] = "%"+localLabel;
                                    }else{
                                    operands[count++]= s;
                                    }
                                }
                                if(count == 2){
                                    previousLabel = "%"+getVarLabel();
                                    if(x.equals("fdiv")){
                                        addCode(previousLabel+" = fdiv double "+operands[0]+", "+operands[1]+"\n");
                                    }
                                    else{
                                        addCode(previousLabel+" = f"+x+" double "+operands[0]+", "+operands[1]+"\n");
                                    }
                                    count=0;
                                }
 
                            }
                            else{
                                operands[count++] = previousLabel;
                                if(vars.get(s) instanceof intVar){
                                    tempLabelConversion = getVarLabel();
                                    addCode("%"+tempLabelConversion+" = sitofp i32 " + s + " to double\n" );                                  
                                    operands[count++] = "%"+tempLabelConversion;   
                                }
                                else{
                                    operands[count++]= s;
                                }
                                previousLabel = "%"+getVarLabel();
                                if(x.equals("fdiv")){
                                    addCode(previousLabel+" = fdiv double "+operands[0]+", "+operands[1]+"\n");
                                }
                                else{
                                    addCode(previousLabel+" = f"+x+" double "+operands[0]+", "+operands[1]+"\n");
                                }
                                count=0;
 
                            }
                        }
                        vars.put(previousLabel, new doubleVar());
                    }
                    else{
                        while(litr.hasPrevious()){
                            String s = litr.previous();
                            if(previousLabel.equals("")){
                                if(vars.get(s).label != null){
                                //It's an int stored variable
                                localLabel = getVarLabel();
                                addCode("%"+localLabel+" = load i32, i32* %"+vars.get(s).label+", align 4\n");
                                operands[count++] = "%"+localLabel;
                                }
                                else{
                                operands[count++]= s;
                                }
                                if(count == 2){
                                    //A division implies a floating value
                                    if(x.equals("fdiv")){
                                        tempOperands[0] = "%"+getVarLabel();
                                        tempOperands[1] = "%"+getVarLabel();
                                        addCode(tempOperands[0]+" = sitofp i32 "+operands[0]+" to double\n");
                                        addCode(tempOperands[1]+" = sitofp i32 "+operands[1]+" to double\n");
                                        previousLabel = "%"+getVarLabel();
                                        addCode(previousLabel+" = fdiv double "+tempOperands[0]+", "+tempOperands[1]+"\n");
                                    }
                                    else{
                                        previousLabel = "%"+getVarLabel();
                                        addCode(previousLabel+" = "+x+" i32 "+operands[0]+", "+operands[1]+"\n");
                                    }
                                    count=0;
                                }
 
                            }
                            else{
                                operands[count++] = previousLabel;
                                if(vars.get(s).label != null){
                                //It's an int stored variable
                                localLabel = getVarLabel();
                                addCode("%"+localLabel+" = load i32, i32* %"+vars.get(s).label+", align 4\n");
                                operands[count++] = "%"+localLabel;
                                }
                                else{
                                operands[count++]= s;
                                }
                                 if(x.equals("fdiv")){
                                        tempOperands[0] = "%"+getVarLabel();
                                        tempOperands[1] = "%"+getVarLabel();
                                        addCode(tempOperands[0]+" = sitofp i32 "+operands[0]+" to double\n");
                                        addCode(tempOperands[1]+" = sitofp i32 "+operands[1]+" to double\n");
                                        previousLabel = "%"+getVarLabel();
                                        addCode(previousLabel+" = fdiv double "+tempOperands[0]+", "+tempOperands[1]+"\n");
                                }
                                else{
                                    previousLabel = "%"+getVarLabel();
                                    addCode(previousLabel+" = "+x+" i32 "+operands[0]+", "+operands[1]+"\n");
                                }
                                count=0;
 
                            }
                        }
                        //A division implies a floating value
                        if(x.equals("fdiv")){
                            vars.put(previousLabel, new doubleVar());
                        }
                        else{
                            vars.put(previousLabel, new intVar());
                        }
                    }
 
                    RESULT = new String(previousLabel);
                :};

The resulting LLVM code is:

declare i32 @printf(i8*, ...)
@.str.0 = private constant [4 x i8] c"%f\0A\00", align 1
@.str.1 = private constant [4 x i8] c"%d\0A\00", align 1
@.str.2 = private constant [4 x i8] c"%d\20\00", align 1
@.str.3 = private constant [4 x i8] c"%f\20\00", align 1
define void @main(){
%1 = add i32 0,4
%2 = add i32 0,3
%3 = sub i32 %2, %1
%4 = add i32 0,4
%5 = add i32 0,3
%6 = add i32 0,2
%7 = add i32 %6, %5
%8 = add i32 %7, %4
%9 = add i32 0,3
%10 = and i32 %9, %8
%11 = add i32 0,5
%12 = add i32 0,4
%13 = sub i32 %12, %11
%14 = add i32 0,1
%15 = add i32 %14, %3
%16 = add i32 %15, %10
%17 = add i32 %16, %13
%18 = alloca i32, align 4
store i32 %17, i32* %18
%19 = load i32, i32* %18, align 4
%20 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.1, i32 0, i32 0), i32 %19)
ret void
}

Functions

Implementation

//Functions
 
func ::=  RO DEFUN ID:str RO num_list:nl  NT2 RC lists RC {:
        funcObject func = new funcObject(str, nl.size());
        func.returnType = "void";
 
        //End of the function
        addCode("ret void \n}\n");
        functionsMap.put(str, func);
        writeOnGlobal = false;
        varLabelCounter = tempGLobal;
:};
 
NT2 ::= {:
        String name = (String) stack(-2);
        String arguments = new String("");
        String label = "";
        Integer functionCount = 3;
 
        //Functions' body in LLVM must declared on the global part
        writeOnGlobal = true;
        LinkedList<String> nl = (LinkedList<String>) stack(0);
        //Counting the arguments
        for(String s: nl){
            arguments += "i32,";
            vars.put(s, new intVar());
        }
        StringBuilder tmp  = new StringBuilder(arguments);
        tmp.setCharAt(arguments.length()-1, ')');
 
        arguments = tmp.toString();
        addCode("define void @"+ name+"("+arguments+"{\n");
        int i = 0;
        //For each argument store a variable
        for(String s : nl){
            label = (functionCount++).toString();
            vars.get(s).label = label;
            addCode("%"+label+ " = alloca i32, align 4\n");
            addCode("store i32 %"+ i+", i32* %"+label+"\n");
            i++;
        }
 
        //LLVM Label counting inside functions is
        //indipendent from the main part
        tempGLobal = varLabelCounter;
        varLabelCounter = functionCount;
 
 
:};

LISP functions with only integer parameters and void return type are supported.

  • Building of functions in LLVM is supported by the use of the marker NT2. When the RO DEFUN ID:str RO num_list:nl symbols are recognized, corresponding for example to
     (defun average(1, 2 

    it is quite clear that a new function has to be created, so the number of arguments is counted and the corresponding declare LLVM instruction is written on the code

  • Attention: LLVM Label counting inside functions is indipendent from the main part so the varLabelCounter is temporary saved and initialized with the function counter value
  • When also the body of the function is reduced the funcObject is created and the final LLVM code is written to end the function

The corresponding LISP code:

(defun division(n1 n2)
    (if (= n1 0) (print err))
    (setq res (/ n2 n1))
    (print res)
)

is translated in the following LLVM code (simplified):

define void @division(i32,i32){
%3 = alloca i32, align 4
store i32 %0, i32* %3
%4 = alloca i32, align 4
store i32 %1, i32* %4
...
...
...
%16 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.0, i32 0, i32 0), double %15)
ret void 
}

The following variants of the PRINT instruction are supported:

  • print a - printing of declared variables (strings, integers, doubles)
  • print arr - printing of array variables
  • print (+ 2 3 4) - printing of direct expressions
  • print (aref my-array 1) - printing of array elements

For semplicity only the parser code for declared variables is provided:

instructions ::= PRINT ID:id {:
        if(vars.get(id) != null){
            //Printing a string
            if(vars.get(id) instanceof stringVar){
                stringVar var = (stringVar) vars.get(id);
                String length = new Integer(var.stringValue.length()).toString();
                addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds (["+length+" x i8], ["+length+" x i8]* @.str."+var.label+", i32 0, i32 0))\n");
            }
            //Printing a double
            else if(vars.get(id) instanceof doubleVar){
                doubleVar var = (doubleVar) vars.get(id);
                Integer localLabel = new Integer(getVarLabel());
                addCode("%"+localLabel+" = load double, double* %"+var.label+", align 8\n");
                addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.0, i32 0, i32 0), double %"+localLabel+")\n");
            }
            //Printing an integer
            else if(vars.get(id) instanceof intVar){
                intVar var = (intVar) vars.get(id);
                Integer localLabel = new Integer(getVarLabel());
                addCode("%"+localLabel+" = load i32, i32* %"+var.label+", align 4\n");
                addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.1, i32 0, i32 0), i32 %"+localLabel+")\n");
            }
            //Printing an array
            else if(vars.get(id) instanceof arrayVar){
                arrayVar var = (arrayVar) vars.get(id);
                String label = "";
                for(int i=0; i< var.size; i++){
                    label = getVarLabel();
                    addCode("%"+label+" = getelementptr inbounds ["+var.size+" x i32] , ["+var.size+" x i32]* %"+var.label+", i32 0, i32 "+i+"\n");
                    if(var.type.equals("i32")){
                        Integer localLabel = new Integer(getVarLabel());
                        addCode("%"+localLabel+" = load i32, i32* %"+label+", align 4\n");
                        if(i != var.size-1){
                            addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.2, i32 0, i32 0), i32 %"+localLabel+")\n");
                        }
                        else{
                            addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.1, i32 0, i32 0), i32 %"+localLabel+")\n");
                        }
                    }
                    else{
                        Integer localLabel = new Integer(getVarLabel());
                        addCode("%"+localLabel+" = load double, double* %"+label+", align 8\n");
                        if(i != var.size-1){
                            addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.3, i32 0, i32 0), double %"+localLabel+")\n");
                        }
                        else{
                            addCode("%"+getVarLabel()+" = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.0, i32 0, i32 0), double %"+localLabel+")\n");
                        }
                    }
 
                }
            }
        }
        else{
            pSemError("Variable not declared!");
        }
 
        :}
  • Strings: as explained in the paragraph Grammar start, strings found in the LISP code must be declared as global in the LLVM IR code. In this way, for printing the string it is only necessary to recover the label form the stringVar object and add the @printf instruction to the LLVM code passing to this one the label (stored in localLabel)
  • Doubles and Integers: declared variables are stored in the memory. intVar and doubleVar objects store the LLVM label of type double* and i32* that locate the memory area where the value is stored. With the load instruction the value is red from the memory and then passed to the @printf instruction
  • Array: if the variable is an array the first thing to do is to extract the pointer to the memory location of the array at a desired offset. For this purpose, the getelementptr LLVM instruction is used. Because we want to print all the array (to print a specific cell AREF is used) the variable i is used to iterate over the positions from 0 to the size of the array minus 1 and passed to the instruction

For instruction

In LISP the classical for statement present in other programming languages is implemented with the loop or loop for instructions. Here the parser implementation of the loop for:

loop_for ::= LOOP FOR ID:i FROM INT:a TO INT:b DO {:
            String oldLabel = "";
            String newLabel = "";
            String cmpLabel = "";
            String localLabel = "";
            String localLabel1 = "";
            loopList.addLast(new loopVar("for"));
            loopVar lv = loopList.getLast();
 
            //Variable i is yet used in the code as global
            //Now becomes local
            if(vars.get(i) != null){
                oldLabel = vars.get(i).label;
                newLabel = getVarLabel();
                vars.get(i).label = newLabel;                       //To avoid use the global variable we need to use a new Label and store the old one
                addCode("%"+newLabel+" = alloca i32, align 4\n");
                addCode("store i32 "+(a-1)+", i32* %"+newLabel+"\n");
            }
            else{
                intVar var = new intVar(a, "4", true);
                vars.put(i, var);
                newLabel = vars.get(i).label;
                addCode("%"+newLabel+" = alloca i32, align 4\n");
                addCode("store i32 "+(a-1)+", i32* %"+newLabel+"\n");
            }
            addCode("br label %for.body."+lv.label+"\n");
            addCode("for.body."+lv.label+":\n");
            localLabel = getVarLabel();
            addCode("%"+localLabel+" = load i32, i32* %"+newLabel+", align 4\n");
            localLabel1 = getVarLabel();
            addCode("%"+localLabel1+" = add i32 %"+localLabel+", 1\n");
            addCode("store i32 %"+localLabel1+", i32* %"+newLabel+", align 4\n");
            localLabel = getVarLabel();
            addCode("%"+localLabel+" = load i32, i32* %"+newLabel+", align 4\n");
            cmpLabel = getVarLabel();
            addCode("%"+cmpLabel+" = icmp slt i32 %"+localLabel+","+(b+1)+"\n");
            addCode("br i1 %"+cmpLabel+", label %for.body2."+lv.label+", label %for.exit."+lv.label+"\n");
            addCode("for.body2."+lv.label+":\n");
            RESULT = oldLabel;
:} lists{:
            loopVar lv = loopList.getLast();
            addCode("br label %for.body."+lv.label+"\n");
            addCode("for.exit."+lv.label+":\n");
            String oldLabel = (String) stack(-1);
            //If oldLabel == "" means that ID:i was only a local vaiable in the loop, remove it
            if(oldLabel.equals("")){
                vars.remove(i);
            }
            //else means that it had an oldLabel so was used as global -> restore the oldLabel
            else{
                vars.get(i).label = oldLabel;
            }
:};
  • loopList.addLast add the current loop to the loop list. This allows us to 'select' the last loop inserted and correctly add to the code the labels. It is necessary when there are nested loops: we need to select always the last inserted.
  • It is necessary to see if the variable used for the loop is already declared outside. In this case the variable must be substituted temporary because by the end of the loop statement it assumes a local scoping
  • With the code “for.body.”+lv.label+“:\n” we enter in the loop body. The first thing to do is to load the loop variable, adding to it 1 and storing it. Then it is compared with the second INT to eventually terminate the loop

Error handling

The compiler is able to recognize the following kind of errors:

  • Variable not declared
  • Variable is not an array
  • Index out of range in arrays
  • Use of non integer values in logical operations
  • Assignent of a wrong type value to an array

Missing functionalities

  • Functions support only numerical values as input and no return type
  • Condition operators support only 2 operands
  • Inside functions and statements is supported the local scoping of variables. The local variables of LISP defined with let instruction are not supported
  • Array support only integer and double variables
  • Float values are not supported
  • In loop for and make-array are supported only direct numerical values, not variables

A simple example

An example of a LISP program used to calculate the first 22 numbers of the Fibonacci serie:

fibo.lisp
(defvar a 0)
(defvar b 1)
(loop for i from 0 to 10
    do (write a)
       (if (= i 10) (print b) (write b))
       (setq a (+ a b))
       (setq b (+ a b))
)

Execution of program will produce the following output:

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946

Download and installation

Compiler: lisp_compiler

Examples

Each example is provided separately and contains the LISP source code and the corresponding output in LLVM IR, generated by the compiler. For sure, each source code can be given as input of the compiler, which will produce the same output contained in the example. In general after

How to run it

  1. Install, if you have not already done so, the llvm package: sudo apt install llvm
  2. Download the lisp compiler and extract it
  3. Open the terminal, go to the folder where the compiler is extracted.
  4. [OPTION 1 - Compiling a generic LISP source file] In general the following list of commands must be executed:
    • jflex lisp_scanner.jflex
    • java java_cup.Main lisp_parser.cup
    • javac *.java
    • java Main source_file.lisp
  5. [OPTION 2 - Compiling provided examples]:
    • To run the bubble sort example run make bubble
    • To run the floyd triangle example run make triagle
    • To run the Fibonacci example run make fibo
    • To run the division example run make division
    • Default make command will run bubble sort example
  6. Run output.ll file with lli: lli output.ll

If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it

You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/lisp_to_llvm
/web/htdocs/www.skenz.it/home/data/pages/compilers/lisp_to_llvm.txt · Last modified: 2021/06/09 18:14 by andrea