User Tools

Site Tools


compilers:lua_to_llvm
Return to Home page

From Lua to LLVM

Background

This page shows the implementation of a compiler that recognizes and translates part of the Lua programming language into the LLVM IR syntax (more information about LLVM can be found here).

The LLVM Intermediate Representation (IR) is placed between the source code in a given programming language and the machine code for a specific architecture. It is independent of both the programming language and the architecture.

Lua language

Many resources are available if you want to learn Lua. Following, you can find a few of them.

  • Programming in Lua - Online version of the first edition of the book Programming in Lua, a detailed and authoritative introduction to all aspects of Lua programming written by Lua's chief architect. It is an entire book, so it is advised to all those who want to fully learn and understand Lua.
  • Lua tutorial - Lua tutorial. It is advised to all those who want to quickly learn Lua, without going too in depth in the language.

Compiler

The compiler is made up of two parts: a scanner and a parser.

Scanner

The scanner has been written using jFlex. It performs lexical analysis, i.e. converts a sequence of bytes into a sequence of tokens. It recognizes the most important Lua source code symbols and generates tokens later used by the parser. The one implemented here can tokenize multiple files. This last feature allows the implementation of require keyword functionality of Lua: multiple files are compiled (statically) into a single LLVM IR syntax program.

Snippet of Lua scanner

  1. ...
  2.  
  3. id = [A-Za-z_][A-Za-z0-9_]*
  4. number = ([1-9][0-9]*|0) | (([0-9]+\.[0-9]*) | ([0-9]*\.[0-9]+)) (e|E('+'|'-')?[0-9]+)?
  5. string = \"~\"
  6. nl = \r|\n|\r\n|" "
  7. n2 = \r|\n|\r\n
  8. comment = "--"~{n2}|"--[["~"--]]"
  9. FILE = \"[A-Za-z_][A-Za-z0-9_\.]*\"
  10. %%
  11.  
  12. {comment} { ; }
  13.  
  14. "require" {
  15. yybegin(INCL);
  16. return symbol(sym.REQUIRE);
  17. }
  18.  
  19. <INCL>[\ \t]+ {;}
  20.  
  21. <INCL>{FILE} {
  22. String file = yytext().substring(1, yytext().length()-1);
  23. Path path
  24. = Paths.get("./" +
  25. file);
  26.  
  27. if (!Files.isReadable(path)){
  28. System.out.println("File " +path + " does not exists ");
  29. System.exit(-1);
  30. return symbol(sym.FNF);
  31. }
  32.  
  33. try{
  34. yypushStream(new FileReader(file.trim()));
  35. yybegin(YYINITIAL);
  36. return symbol(sym.FILE, file.substring(0, file.length()-4));
  37. }catch(Exception e){
  38.  
  39. return symbol(sym.FNF);
  40.  
  41. }
  42. }
  43.  
  44. <DELETENR>[\t\n\r]+ {yybegin(YYINITIAL);}
  45.  
  46. <<EOF>> {
  47.  
  48. if(yymoreStreams()){
  49. yypopStream();
  50. yybegin(DELETENR);
  51. parser.libraryName="";
  52. }else return symbol(sym.EOF);
  53. }
  54.  
  55.  
  56. "string.format" {return symbol(sym.STRFRT);}
  57. "do" {return symbol(sym.DO);}
  58. "for" {return symbol(sym.FOR);}
  59. "if" {return symbol(sym.IF);}
  60. "else" {return symbol(sym.ELSE);}
  61.  
  62. "end" {return symbol(sym.END);}
  63. "local" {return symbol(sym.LOCAL);}
  64.  
  65. "nil" {return symbol(sym.NIL);}
  66. "print" {return symbol(sym.PRINT);}
  67. "return" {return symbol(sym.RETURN);}
  68. "function" {return symbol(sym.FUNCTION);}
  69. "then" {return symbol(sym.THEN);}
  70. "while" {return symbol(sym.WHILE);}
  71. "until" {return symbol(sym.UNTIL);}
  72. "repeat" {return symbol(sym.REPEAT);}
  73.  
  74. "(" {return symbol(sym.RO);}
  75. ")" {return symbol(sym.RC);}
  76. "{" {return symbol(sym.BO);}
  77. ...
  78. "not" {return symbol(sym.NOT);}
  79. "^" {return symbol(sym.HAT);}
  80. "~=" {return symbol(sym.NOTEQ);}
  81. "[" {return symbol(sym.SO);}
  82. "]" {return symbol(sym.SC);}
  83. "," {return symbol(sym.CM);}
  84.  
  85.  
  86. {id}\.{id} {return symbol(sym.ID, yytext());}
  87. {id} {return symbol(sym.ID, yytext());}
  88. {string} {return symbol(sym.STRING, yytext());}
  89. {number} {return symbol(sym.NUMBER, Double.valueOf(yytext()));}
  90.  
  91.  
  92. {nl} {;}
  93.  
  94. . {System.out.println("SCANNER ERROR: "+yytext());}

The snippet above contains the most important part of the scanner code. Grammar rules occupy most of it: they are the key ingredient to transform a sequence of input bytes into a token. For example, the rule at line 89 allows recognizing a Lua number and giving the parser a symbol of type NUMBER with its numeric value attached.

In order to scan and parse multiple files, states must be used. The symbol <:STATE> before some grammar rules allow to activate them only when the scanner is in the state STATE. In particular, when the require keyword is recognized (line 14), the instruction yybegin(“INCL”) allows changing the state of the scanner to the state INCL. In this way, the two following rules are activated and a sequence of operations are performed in order to switch from the current file to the one specified after the keyword require (get the file name, open it, push it on the stack of the file…). Similarly, when the EOF is reached, the current file is popped from the file stack and the scanner is put in a different state (DELETENR) to restart the scan of the original file.

Parser

The parser performs the syntactical analysis. It asks the scanner for tokens and recognizes the main grammar rules of Lua language. When a sequence of symbols is reduced, an action is performed. This allows generating LLVM IR code from Lua source code, as well as performing some semantic analysis.

Note:

Throughout this parser, intermediate actions are present in the rhs of rules. It is worth to remember that if we want to put an action in the middle of the rhs of a rule in a bottom-up parser, we use a dummy nonterminal, called a marker. For example,

X ::= a { action } b

is equivalent to

X ::= M b

M ::= a { action }

This is done automatically by the CUP parser generator (i.e., we can actually put actions in the middle of a rhs of a rule and CUP will use the above trick to put it at the end of a rule). There is a danger though that the resulting rules may introduce new shift/reduce or reduce/reduce conflicts.

Output structure

Before diving into the translation from Lua source code to LLVM IR and all the support data structures used by the parser, let us understand the global picture. The following is an LLVM IR code translated with this parser (and the related scanner).

  1. @.str.0 = private constant [6 x i8] c"%.0f \00", align 1
  2. @.str.1 = private constant [2 x i8] c"\0A\00", align 1
  3.  
  4. @a = global double 0.0, align 8
  5. @l = global double 20.0, align 8
  6. @i = global double 0.0, align 8
  7. @k = global double 0.0, align 8
  8.  
  9. define double @Math.pow(double, double ) {
  10. ....
  11. }
  12.  
  13.  
  14. define void @main(){
  15. store double 0.0, double* @a, align 8
  16. store double 20.0, double* @l, align 8
  17. store double 0.0, double* @i, align 8
  18. br label %for.cond.1
  19. for.cond.1:
  20. %1 = load double, double* @i, align 8
  21. %2 = load double, double* @l, align 8
  22. %3 = fcmp ole double %1, %2
  23. ...
  24. br label %for.inc.1
  25. for.exit.1:
  26. ret void
  27. }

It is clear, looking at the code, that it is divided into three sections:

  • constant string declarations
  • global variables declarations and functions
  • main function

In order to achieve this organization of the code, three StringBuffer are used throughout the parser:

  • stringDecl (used for constant string declarations)
  • globalDecBuffer (used for global variables declarations)
  • funcBuffer (used for functions)

To simplify the append operation, wrapper functions are defined. They allow to append something and choose if go to a new line or not.

The following wrapper functions are defiend:

  1. public void appendFuncBuffer(String s, boolean newLine) {
  2. funcBuffer.append(s);
  3. if (newLine)
  4. funcBuffer.append("\n");
  5. }
  6.  
  7. public void appendGlobalDecBuffer(String s, boolean newLine) {
  8. globalDecBuffer.append(s);
  9. if (newLine)
  10. globalDecBuffer.append("\n");
  11. }
  12.  
  13. public void appendMainBuffer(String s, boolean newLine) {
  14. currentSymTable.currentBuffer.append(s);
  15. if (newLine)
  16. currentSymTable.currentBuffer.append("\n");
  17. }

Grammar start

When an LLVM IR instruction must be generated after a reduction, it is appended to the proper buffer. When the start symbol is reduced (meaning that the code has been parsed correctly), the actions performed consist of printing the StringBuffers in the proper order, so as to achieved the above mentioned code structure.

  1. prog ::= stmt_list {:
  2. printStrings();
  3. bwr.write(stringDecl.toString()+"\n");
  4. bwr.write(globalDecBuffer.toString()+"\n");
  5. bwr.write(funcBuffer.toString()+"\n");
  6. bwr.write("define void @main(){\n");
  7. bwr.write(mainBuffer.toString());
  8. bwr.write("ret void\n}");
  9. bwr.flush();
  10. //close the stream
  11. bwr.close();
  12. :}
  13. ;

Symbol table

The symbol table is a core data structure in all parsers, and in this one too. Inside of it, numerous data collections can be found, used for different purposes. Comments in the below code help understand the need and use of them. One of the main features that are possible thanks to this structure, is the local scoping.

The constructor at line 32 allows chaining many symbol tables while going deep down in nested statements. It takes as input the previous symbol table and a flag used to distinguish if we are entering a function or not. In the latter case, the code from line 52 to line 58 is executed, allowing to set the currentBuffer to funcBuffer. Also, it is worth noticing that when entering a function, the register number must restart from 1.

  1. public class SymbolTable {
  2. /* Used in multiple assignment */
  3. public ArrayList<ValueObj> varList;
  4. /* Used in multiple assignment */
  5. public ArrayList<ValueObj> expList;
  6. /* Actual Symbol table storing NAME : VALUE */
  7. public HashMap<String, ValueObj> varTable;
  8. /* Counter of llvm registers */
  9. public Integer registerCount; // used as counter for SSA registers
  10. /* reference to previous symbol table - used in local scopes */
  11. SymbolTable prev;
  12. /* stores the current buffer to append to */
  13. StringBuffer currentBuffer;
  14. /*
  15.   *boolean storing if we are in a function or not
  16.   * needed to distinguish local scopes due to blocks from functions
  17.   */
  18. boolean isFunc;
  19.  
  20. public SymbolTable getPrev(boolean insideSameFunction) {
  21. if (insideSameFunction) {
  22. prev.registerCount = registerCount;
  23. return prev;
  24. } else {
  25. return prev;
  26. }
  27. }
  28.  
  29. /*
  30.   * isFunction is needed to distinguish local scopes due to blocks from functions
  31.   */
  32. public SymbolTable(SymbolTable p, boolean isFunction) {
  33. this.varTable = new HashMap<String, ValueObj>();
  34. this.varList = new ArrayList<ValueObj>();
  35. this.expList = new ArrayList<ValueObj>();
  36. this.prev = p;
  37.  
  38. /*
  39.   * when starting parsing, register counter start from 1
  40.   * when entering a local scope, continue with previous counter
  41.   */
  42.  
  43. this.registerCount = p == null ? 1 : p.registerCount;
  44.  
  45. /*
  46.   * when starting parsing, set current buffer to main buffer
  47.   * when entering a local scope, continue with previous buffer
  48.   */
  49.  
  50. currentBuffer = p == null ? currentBuffer : p.currentBuffer;
  51. isFunc=false;
  52. if (isFunction) {
  53. /* if it is a funct, restart from 1 */
  54. registerCount = 1;
  55. /* as it is a func, use func buffer */
  56. currentBuffer = funcBuffer;
  57. isFunc=true;
  58. }
  59. }
  60.  
  61. public ValueObj get(String s) {
  62. /*
  63.   * Given an id, look for it in the current symbol table
  64.   * and in all the ones above
  65.   */
  66.  
  67.  
  68. for (SymbolTable sym = this; sym != null; sym = sym.prev) {
  69. ValueObj found = sym.varTable.get(s);
  70. if (found != null)
  71. return found;
  72. }
  73. return null;
  74. }
  75. }
  76.  

Function declaration

The following code is used by the parser to recognize function declarations.

  1. function_decl ::= FUNCTION {: currentSymTable = new SymbolTable(currentSymTable, true); //use new symbol table
  2. currentSymTable.currentBuffer=funcBuffer; //set buffer to func buffer
  3. :}ID:fName RO func_decl_param RC {:
  4. if(!libraryName.isBlank()){ //to account for module functions
  5. fName=libraryName+"."+fName;
  6. }
  7.  
  8. if(funcTable.containsKey(fName)){
  9. pSemError("FUNCTION ALREADY DECLARED");
  10. }
  11. FuncObj func = new FuncObj(fName); //create new funct object
  12. funcTable.put(fName, func);
  13. func.nargsTot=currentSymTable.varList.size(); //set number of param in the function
  14.  
  15. appendMainBuffer(("define double @" + fName + "("), false); //definition of function
  16.  
  17. for (int i = 0; i < currentSymTable.varList.size(); i++){ //loop through all parameters and append it to the function delcaration
  18. if(i!=currentSymTable.varList.size()-1)
  19. appendMainBuffer("double, ", false);
  20. else
  21. appendMainBuffer("double ", false);
  22. }
  23.  
  24. appendMainBuffer((") {"), true); //end function declaration
  25.  
  26. currentSymTable.registerCount = func.nargsTot + 1;
  27.  
  28.  
  29.  
  30. for (int i = 0; i < currentSymTable.varList.size(); i++){ //inside the function, allocate a value for each parameter
  31. String reg = getRegister();
  32. ValueObj tmp = new ValueObj(reg); //create new value parameter
  33. tmp.setDouble();
  34. tmp.setLocal();
  35. currentSymTable.varTable.put(currentSymTable.varList.get(i).name, tmp); //add to the symbol table
  36.  
  37. appendMainBuffer(("%" + reg + " = alloca " + "double" + ", align " + "8"), true); //allocate parameter
  38. appendMainBuffer(("store " + "double" + " %" + i + ", " + "double* " + "%" + reg + ", align 8"), true); //store passed parameter in the function param
  39. }
  40.  
  41. currentSymTable.varList.clear(); //clear var list
  42. :}stmt_list ret END {: appendMainBuffer("\n}", true); //use directly the regitter
  43.  
  44. currentSymTable = currentSymTable.getPrev(false); currentSymTable.currentBuffer=mainBuffer; //go to previous symbol table e buffer
  45. :}
  46.  
  47. | FUNCTION RO func_decl_param RC stmt_list error {: pSynWarning("MISSING RETURN STATEMENT ");:}
  48. ;
  49. //When entering into a function, it is necessary to switch the StringBuffer --> Need to add stringBuffer in the symbol table
  50.  
  51.  
  52. func_param_list ::= func_param_list:x CM exp:y {: RESULT=x;
  53. x.add(y); :}
  54. | exp:y {: RESULT= new ArrayList<ValueObj>();
  55. RESULT.add(y);
  56. :}| {: RESULT= new ArrayList<ValueObj>();
  57.  
  58. :} ;

In order to understand what it does, let's analyze the following piece of Lua source code from the parser point of view:

  1. function pow(x, y)
  2. ...
  3. end

Upon receiving the token FUNCTION, the parser executes the associated action. A new SymbolTable is created, which is linked to the previous one (currentSymTable) and the flag to signal we are entering in a function is set to true. The current buffer of the new symbol table is set to the functions buffer (the one we have to write to in order to declare functions).

After receiving the name of the function (ID) and the list of parameters (func_decl_param) (line 3), two checks are performed:

  1. If we are parsing a module added with require, the name will be modified accordingly (line 5)
  2. If the function has already been declared, a semantic error is thrown (line 9).

Then, a FuncObj (support data structure for functions) is created and initialized as needed. The function definition (name plus parameters) is appended to the mainBuffer (that is funcBuffer at the moment). From line 15 to 24 the function declaration is performed. From line 25 to line 41, memory is allocated in the stack for the variables passed as parameters.

After, a token stmt_list followed by ret is recevied by the parser. stmt_list represents the body of the function, and contains any statement accepted by the compiler.

Upon receiving the token END, the associated semantic action, consisting of restoring the previous symbol table and the buffer, is perfomed (line 44)

The result has the following structure:

  1. define double @Math.pow(double, double ) {
  2.  
  3. %3 = alloca double, align 8
  4. store double %0, double* %3, align 8
  5. %4 = alloca double, align 8
  6. store double %1, double* %4, align 8
  7.  
  8. //Body of function
  9.  
  10. ret double %17
  11.  
  12. }

DO END statement

  1. block ::= DO {: currentSymTable = new SymbolTable(currentSymTable, false); :}
  2. stmt_list
  3. END {: currentSymTable=currentSymTable.getPrev(true);:};

DO END statement allows to create a local scope. To achieve this feature, it is sufficient to create a new symbol table (attached to the previous one).

After receiving the token DO, the parser can create the new symbol table (line 1). A sequence of statements can be in the body of DO END. After receiving the toke END, the parser can close the local scope by restoring the previous symbol table.

REQUIRE statement

  1. include ::= REQUIRE FILE:x {: libraryName=x;
  2. if(requiredFileList.contains(x)){
  3. pSemError("FILE ALREADY INCLUDED");
  4. }
  5. requiredFileList.add(x);
  6. :}
  7.  
  8. | REQUIRE FNF {: pSynWarning("REQUIRED FILE NOT FOUND"); :}
  9. | REQUIRE error {: pSynError("REQUIRED FILE NOT SPECIFIED"); :}
  10. ;

Almost all of the work needed to implement REQUIRE is done by the scanner. The parser has only the job to check that a file with the same name has not already been included. In such a case, a semantic error is raised.

The parser receives the token REQUIRE, followed by aFILE token (that has the name of the file as an attribute).

The following code contains the supporting data structures:

  1. /* Used to construct function names like libraryName.functionName
  2.   * When not parsing a library, the name is "" because we are in the main file
  3.   */
  4. libraryName = "";
  5.  
  6. /* List used to keep track of of the libraries already inserted and detect duplicates */
  7. requiredFileList=new ArrayList<String>();
  8.  

The requiredFileList ArrayList is used to store the name of all files that have been already included.

The libraryName variable is used in order to save functions declared inside a file with the name used in their declaration preceded by the name of the file in which they are declared. For example, if the file Math.lua contains the function pow(x, y), in order to invoke it, Math.pow(x, y) must be written.

Global declarations-assignements

The following code implements the declaration or assignemnt of 1 or more variables:

  1.  
  2. //global declaration and initialization
  3. global_var_init ::= var_list EQ ass_list {://check is sizes match and generate error if not
  4.  
  5. for(int i=0; i<currentSymTable.varList.size(); i++){ //var list stores the ValueObj of those var
  6. ValueObj ValueObjectOfVar = currentSymTable.varList.get(i); //get valueObj of that var
  7. ValueObj tmp = currentSymTable.get(ValueObjectOfVar.name); //try to get it from symbol table
  8. if(tmp == null){ //if null, it has never been declared
  9. if(currentSymTable.isFunc){
  10. pSemError("CANNOT DECLARE GLOBAL VARIABLES INSIDE FUNCTION");
  11. }
  12. globalSymbolTable.varTable.put(ValueObjectOfVar.name, ValueObjectOfVar); //put in simbol table
  13. initVar(ValueObjectOfVar, currentSymTable.expList.get(i)); //init var
  14. }else{
  15. initVar(tmp, currentSymTable.expList.get(i)); //if it has been already declared, pass the valueObj of the symbol table
  16. }
  17.  
  18. }
  19. currentSymTable.varList.clear();
  20. currentSymTable.expList.clear();
  21.  
  22. :};
  23.  
  24.  

In Lua, the declaration or assignment of global variables looks the same. From the parser's point of view, the declaration and the assignment are different.

After receiving all the tokens needed ( var_list EQ ass_list ), the parser can execute the related action. It consists of a for loop (one iteration for each variable, line 5). Each iteration can be divided into 3 parts:

  • Retrieve the ValueObj (data structure for variables) for the variable we are processing and check if it is null.
  • If it is null, the variable must be declared. Put the variable in the symbol table and initialize it using initVar (function that does the job)
  • If it is not null, the variable must just be initialized with a new value. Function initVar (same as the one used above) is used.

Function initVar (not reported here for sake of brevity), assigns to the variable the corresponding value.

Example

  1. a, b = 5, 6
  2.  
  3. a = 4
  4. b = 10
  5.  
  1. declare i32 @printf(i8*, ...)
  2.  
  3. @a = global double 5.0, align 8
  4. @b = global double 6.0, align 8
  5.  
  6.  
  7. define void @main(){
  8. store double 5.0, double* @a, align 8
  9. store double 6.0, double* @b, align 8
  10. store double 4.0, double* @a, align 8
  11. store double 10.0, double* @b, align 8
  12. ret void
  13. }

Line 1 of Lua code, generates lines 3-4 of the LLVM IR code. The code is not optimized, and for sake of generality, lines 8-9 of IR code is also generated. This could be avoided in order to optimize the code.

Line 3-4 of Lua code, generate lines 10-11 of IR code. The variables are already declared, so it is sufficient to use store instructions to store the desired values.

If Else statements

  1. if_block ::= IF {: currentSymTable = new SymbolTable(currentSymTable, false);
  2. loopCount = ++totLoopCount; loopList.push(loopCount);//when entering a statement, save the loop number on the stack
  3. :} loop_cond:x {:
  4. appendMainBuffer(("br i1 " + x.scope+x.name + ", label %if.body." + loopCount + ", label %if.else." + loopCount), true);
  5. :} THEN {:
  6. appendMainBuffer(("if.body." + loopCount + ":"), true);:} stmt_list {:loopCount=loopList.pop();:} //pop here to retrieve my loop nubmber ;
  7. else_block END
  8.  
  9. | IF error stmt_list else_block END{: pSynWarning("ERROR IN IF CONDITION");:};
  10.  
  11. else_block ::= {:
  12. appendMainBuffer(("br label %if.exit." + loopCount), true);
  13. :} ELSE {:
  14. appendMainBuffer(("if.else." + loopCount + ":"), true);
  15. loopList.push(loopCount);
  16. :}stmt_list {:
  17.  
  18. loopCount=loopList.pop();
  19. appendMainBuffer(("br label %if.exit." + loopCount), true);
  20. appendMainBuffer(("if.exit." + loopCount + ":"), true);
  21. currentSymTable=currentSymTable.getPrev(true);
  22.  
  23.  
  24. :}|
  25. {:
  26. appendMainBuffer(("br label %if.else." + loopCount), true);
  27. appendMainBuffer(("if.else." + loopCount + ":"), true);
  28. currentSymTable=currentSymTable.getPrev(true);
  29. :} ;

This parser supports if-then-else statements nested (any depth). The implementation of this may seem tricky, but it is actually straightforward. Let us analyze the code in detail:

After receiving the IF token (line 1), the parser enters in a new local scope and thus needs to create a new symbol table. At line 2, variable totLoopCount is present. This variable is needed in order to correctly enumerate the labels needed to perform the various jump in the if-else statements. A correct numeration of this is essential, as there cannot be duplicated and each label must identify an exact location in the code.

In order to achieve this, the variable totLoopCount is incremented whenever a new set of labels must be generated and never decremented. This is not enough: if we want to have multiple levels of nestings, upon exiting a nested if-else, it is necessary to restore the correct label number in loopCount, as it is needed in a possible else statement that follows.

A LIFO list (a stack) is used for this purpose. At line 2, the label number assigned to this if statement is used is save in a variable (loopCount) and totLoopCount incremented. Immediately after, loopCount is saved on the stack (loopList). For all the following label of the current if, the saved number is used in the label. When entering another if statement in the body (nested if), the same procedure happens again. Upon exiting the if in the body of the upper level if (line 6, after stmt_list), the label number is popped from the stack.

This procedure can be applied recursively and always guarantees that each if statement retrieves the right label number. In the else_block rule, a similar procedure (identical in the logic) happens. The else_block symbol is always needed in the if_block rule: when there is actually an else statement, this is parsed, otherwise and else block is reduced from the empty string (Line 24). At the end, the previous symbol table must be restored (line 21 or 28, depending on the case).

Repeat until statement

Repeat-until statements corresponds to do-while in C code. The parser code to translate it into LLVM IR code is similar (if not almost identical) to the one used to implement the do-while of Lua.

  1.  
  2. repeat_loop ::= REPEAT {:
  3. currentSymTable = new SymbolTable(currentSymTable, false);
  4. loopCount = ++totLoopCount;
  5. loopList.push(loopCount); //when entering a statement, save the loop number on the stack
  6. appendMainBuffer(("br label %for.body." + loopCount), true);
  7. appendMainBuffer(("for.body." + loopCount + ":"), true);
  8. :}
  9. stmt_list
  10. {:
  11. loopCount=loopList.pop(); //restore it when statement is finished
  12. appendMainBuffer(("br label %for.cond." + loopCount), true);
  13.  
  14. :}
  15. UNTIL
  16. {:
  17. appendMainBuffer(("for.cond." + loopCount + ":"), true);
  18. :}
  19. loop_cond:x
  20. {:
  21. appendMainBuffer(("br i1 " + x.scope+x.name + ", label %for.body." + loopCount + ", label %for.exit." + loopCount), true);
  22. appendMainBuffer(("for.exit." + loopCount + ":"), true);
  23. currentSymTable=currentSymTable.getPrev(true);
  24. :}
  25.  
  26. ;
  27.  
  28.  

The above code does the following:

  • After receiving the REPEAT token (line 2), the related action is performed. It consists of creating a new symbol table for the local scope block in which we are entering. Also, as for the if loop, variable loopCount, totLoopCount and loopList take care of preserving the correctness of label numbering (explanation is the same as for the if statement). LLVM IR code is appended to the main buffer: an unconditional branch instruction to the start of the loop (lines 6-7).
  • A stmt_list token is received, representing the body. Nested statements are possible inside the body. After the entire body is parsed, the correct label number is restored (line 11) and an unconditional jump is appended to the buffer, allowing to go to the evaluation of REPEAT-UNTIL condition.
  • At line 17, the label for evaluating the condition is inserted.
  • Then, the loop_cond token is received. It is a ValueObj object storing the information of the condition evaluated. The semantic action inserts in the main buffer a conditional branch instruction: if the loop_cond is true, we jump to the beginning of the loop, otherwise, we proceed to the exit and restore the previous symbol table (lines 21-23).

An example

Lua source code

i=6
while i>1 do
    j=1
    while j<i do
 
        print("*")
        j=j+1
    end
 
print( "\n")
i=i-1
end

After compilation

declare i32 @printf(i8*, ...)
@.str.0 = private constant [2 x i8] c"*\00", align 1
@.str.1 = private constant [2 x i8] c"\0A\00", align 1
 
@i = global double 6.0, align 8
@j = global double 1.0, align 8
 
 
define void @main(){
store double 6.0, double* @i, align 8
br label %for.cond.1
for.cond.1:
%1 = load double, double* @i, align 8
%2 = fcmp ogt double %1, 1.0
br i1 %2, label %for.body.1, label %for.exit.1
for.body.1:
store double 1.0, double* @j, align 8
br label %for.cond.2
for.cond.2:
%3 = load double, double* @j, align 8
%4 = load double, double* @i, align 8
%5 = fcmp olt double %3, %4
br i1 %5, label %for.body.2, label %for.exit.2
for.body.2:
%6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.0, i32 0, i32 0))
%7 = load double, double* @j, align 8
%8 = fadd double %7, 1.0
store double %8, double* @j, align 8
br label %for.cond.2
for.exit.2:
%9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0))
%10 = load double, double* @i, align 8
%11 = fsub double %10, 1.0
store double %11, double* @i, align 8
br label %for.cond.1
for.exit.1:
ret void
}

Output running LLVM IR

*****
****
***
**
*

What has been implemented

Implementation list: impl_list

Download and installation

Following, you can find:

  • The compiler archive, containing:
    • Parser (parser.cup)
    • Scanner (scanner.jflex)
    • Makefile
    • skeleton.nested
  • Examples archives. Each example archive contains:
    • [Example_name].lua (Lua source code)
    • parser.cup
    • scanner.jflex
    • skeleton.nested
    • Makefile
    • Main.java
    • out.ll (LLVM IR output file generate by the command lli [Example_name].lua)
    • Possibly additional .lua file used as libraries.

Each example archive containes the Lua source code and the corresponding output in LLVM IR, generated by the compiler as well as all needed external files (such as Math.lua or stack.lua). In order to facilitate the compilation, each example archive contains the scanner, the parser, and all other files necessary to make the compiler work. Furthermore, the LLVM IR output file is also included. In a system supporting Make, it is enough to open the folder in a terminal and run Make. For a Windows system or for a full description, follow the guides below.

Compiler

Archive containing parser (parser.cup), scanner (scanner.jflex) and Makefile (as well as skeleton.nested used by the scanner) lua_compiler

Examples

External libraries

(must be used in another program or modified)

How to run it on Linux/Mac systems

NOTE:

jFlex v1.8.2 is needed in order to make it run. Lower versions do not work correctly (in this particular case).

  • If you want to check your version, open the terminal and run jflex -version.
  • If you have installed jFlex with the apt manager or you installed a lower version, you can run sudo apt remove jflex to uninstall it and follow the below guide to install the correct version.
  • If you have installed jFlex manually, you have to remove manually too (based on where files have been placed).
  1. Install, if you have not already done so:
    1. llvm package:
      1. Linux: sudo apt install llvm
        1. Mac: brew install -with-toolchain llvm
    2. Linux: jFlex v1.8.2 (just copy, paste and press enter in the terminal, even with comments)
      #remove old version is present
      sudo apt remove jflex
      #download jFlex archive 
      cd $HOME
      wget https://github.com/jflex-de/jflex/releases/download/v1.8.2/jflex-1.8.2.tar.gz
      #decompress it and move to the correct location
      sudo tar -C /usr/share -xvzf jflex-1.8.2.tar.gz
      #create a link in /usr/bin
      sudo ln -s /usr/share/jflex-1.8.2/bin/jflex /usr/bin/jflex
      #add correct user rights
      sudo chmod -R 755 /usr/share/jflex-1.8.2
      sudo chmod 755 /usr/bin/jflex
      #remove downloaded archive
      rm jflex-1.8.2.tar.gz
    3. jFlex v1.8.2 (only for macOS) and CUP - NOTE: For Linux set-up, skip the part regarding the installation of jFlex in the below guide.
      1. Install Linux Bash: How to download, install and configure Jflex, Java, and Cup in the Ubuntu Linux operating system with bash shell
      2. Install macOS: How to download, install and configure Jflex, Java, and Cup in the macOS operating system
  2. If you want to run one of the examples (substitute fibonacci_series with the name of the example you want to run, if different)
          # first two commands create a folder in the desktop to run the example: you can skip them if you want to download it somewhere else
          cd $HOME/Desktop
          mkdir -p run_example 
          cd run_example
          wget https://www.skenz.it/repository/compilers/ass/lua_to_LLVM/fibonacci_series.zip #substitute fibonacci_series with the example you want to run
          unzip fibonacci_series.zip #substitute fibonacci_series with the example you want to run
          rm fibonacci_series.zip
          cd fibonacci_series #substitute fibonacci_series with the example you want to run
          make
          make run
          lli out.ll
  3. If you want to run the lua compiler with an arbitrary (compatible with the compiler) .lua file
  4. Download the lua compiler and extract it
  5. Open the terminal, go to the folder where the compiler is extracted and run:
    1. make
    2. As alternative to using the Makefile, run the following commands from the same folder where lua compiler is:
      java java_cup.Main -parser parser parser.cup
      jflex -skel skeleton.nested scanner.jflex
      javac *.java
  1. Take a .lua file (from the examples or write a piece of code supported by the compiler)
  2. Run the compiler, passing the .lua file as input
    1. java Main [filename].lua
  3. Run the output file with the command lli: lli [filename].ll

NOTE: The procedure has been tested on Linux, not on macOS.

How to run it on WSL/WSL2 systems

The procedure is the same as for LINUX systems. It has been tested and works correctly on WSL2.

How to run it on Windows systems

On Windows it is possible to use both the scanner and compiler, but it is not possible to run LLVM IR code.

- Install, if you have not already done so:

  1. jFlex v1.8.2 and CUP
    1. Install Windows: How to download and install Jflex, Java, and Cup in the Windows operating system
  2. Download the lua compiler and extract it
  3. Open the terminal, go to the folder where the compiler is extracted and run:
    1. java java_cup.Main -parser parser parser.cup
    2. jflex -skel skeleton.nested scanner.jflex
    3. javac *.java
  4. Take a .lua file (from the examples or write a code supported by the compiler)
  5. Run the compiler, passing the .lua file as input
  6. java Main [filename].lua
  7. You can open the out.ll file with a text editor

If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it

You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/lua_to_llvm
/web/htdocs/www.skenz.it/home/data/pages/compilers/lua_to_llvm.txt · Last modified: 2021/07/13 14:20 by davide