Table of Contents
From Matlab to LLVM
Background
This page shows the implementation of a compiler that recognizes and translates part of the Matlab programming language into the LLVM IR syntax (more information about LLVM can be found here).
Implemented features
List of the Matlab features Implemented
Data types
- Basic data types
- int
- double
- arrays
- matrix
Operators
- arithmetic:
- addition (+)
- subtraction (-)
- multiplication (*)
- multiplication element by element (.*)
- division(/)
- division element by element (./)
- comparison:
- equality ( == )
- greater than (>)
- less than (<)
- greater than equal to (>= or = >)
- less than equal to (< = or =<)
- logical:
- AND (&)
- OR (|)
Sub block of codes
- if instruction
- else instruction
- for loop
- while loop
Function details (partial implementation):
- int type parameters
- int return type
Output:
- disp/fprintf
- with variables and text (fprintf) and with only one variable (disp)
Compiler
The compiler is built of two parts: a scanner and parser
Scanner
The scanner is able to recognize and retrieve tokens (terminal symbols) to the parser coupled with an object containing a value that represents the token. It identifies integers, doubles and ids (that will be used for variables, function names, etc…) and other significant Matlab keywords like:
if
else
end
for
while
function
fprintf
disp
And other syntax elements like punctuation and other symbols.
Snippet of Matlab Scanner
nl = \r|\n|\r\n ws = [ \t] id = [A-Za-z][A-Za-z0-9_]* integer = ([1-9][0-9]*|0) double = (([0-9]+\.[0-9]*) | ([0-9]*\.[0-9]+)) (e|E('+'|'-')?[0-9]+)? %% "(" {return symbol(sym.RO);} ")" {return symbol(sym.RC);} "=" {return symbol(sym.EQ);} "+" {return symbol(sym.PLUS);} "-" {return symbol(sym.MINUS);} "*" {return symbol(sym.STAR);} ".*" {return symbol(sym.DOTSTAR);} "/" {return symbol(sym.DIV);} "./" {return symbol(sym.DOTDIV);} "<" {return symbol(sym.MIN);} ">" {return symbol(sym.MAJ);} "<=" {return symbol(sym.MIN_EQ);} "=<" {return symbol(sym.EQ_MIN);} ">=" {return symbol(sym.MAJ_EQ);} "=>" {return symbol(sym.EQ_MAJ);} "&" {return symbol(sym.AND);} "|" {return symbol(sym.OR);} "~" {return symbol(sym.NOT);} "[" {return symbol(sym.SO);} "]" {return symbol(sym.SC);} "function" {return symbol(sym.FUNCT);} "end" {return symbol(sym.END);} "disp" {return symbol(sym.DISP);} "fprintf" {return symbol(sym.PRINT);} "if" {return symbol(sym.IF);} "while" {return symbol(sym.WHILE);} "for" {return symbol(sym.FOR);} "else" {return symbol(sym.ELSE);} ";" {return symbol(sym.S);} "," {return symbol(sym.CM);} ":" {return symbol(sym.C);} {id} {return symbol(sym.ID, yytext());} {integer} {return symbol(sym.INT, new Integer(yytext()));} {double} {return symbol(sym.DOUBLE, new Double(yytext()));}
Parser
The parser can take as input the tokens provided by the scanner and recognize the main grammatical rules of Matlab language. As a result, the LLVM IR code is produced.
Data structures
This snippet shows all variables and classes used to support the parser on the creation of the output program:
public HashMap <String, InfoVar> symbolTable; public HashMap <String, InfoFun> functionTable; public boolean isCorrect = true; public StringBuffer stamentsBuff; public ArrayList<String> stringStatements; public int var_count = 0; public int str_label = 0; public int sub_label = 0; public int else_label = 1; public int tot_sub_label = 0; public int cmp_count=0; public boolean activate_while = false; public boolean desctivate_while = false; public boolean activate_for = false; public boolean desctivate_for = false; public String ret_id = ""; public BufferedWriter bwr; public int genVarCount(){ var_count++; return var_count; }; public int genStrCount(){ str_label++; return str_label; }; public class InfoVar{ public String reg_id; //First label assigned to the variable public String load_to; //Reg id of the one who loaded an existing variable (default self reg_id) public String type; //i32, double public String value; //The real value of the variable (ex: 1 or 1.0) public Integer align; //alignment required: 4, 8... public Integer size1; //If the variable is an array then this is its size, otherwise size1 = -1 public Integer size2; //If the variable is a matrix then this is its size, otherwise size1 = -1 public boolean just_created; //It helps to know if an operation must use the load_to or the real value public InfoVar() { reg_id = Integer.toString(genVarCount()); load_to = Integer.toString(var_count); size1 = size2 = -1; } InfoVar(Integer value, String type, Integer align) { this.just_created = true; this.value = Integer.toString(value); this.type = type; this.align = align; } InfoVar(Double value, String type, Integer align) { this.just_created = true; this.value = Double.toString(value); this.type = type; this.align = align; } } public class InfoFun{ ArrayList<String> funParam; Integer numParam; String funRet; public InfoFun(ArrayList<String> funParam) { this.funParam = funParam; this.numParam = funParam.size(); this.funRet = "i32"; }
Class InfoVar
: class that represents a variable, array or matrixreg_id
: represents the register in which the variable is storedload_to
: represents the register where a variable is going to be loadtype
: represents the type of the variablevalue
: the real value of the variablealign
: the align for the variablesize1
: size of the arraysize2
: size of the columns of matrices if neededjust_created
: It helps to know if an operation must use the load_to or the real value
Class InfoFun
: class used to represent functions informationfunParam
: list of parameters typenumParam
: number of parametersfunRet
: return type
Hashmap<String, TypeVar> symbolTable
: hashmap containing the correspondence between a variable ID and a InfoVarHashmap<String, TypeFun> functionTable
: hashmap containing the correspondence between a fuction ID and a InfoFunStringbuffer stamentsBuff
: buffer used to save all the outputs and then display an output.ll fileArrayList<String> stringStatements
: array of the definition of the string in LLVM language, tipcally to be printedvar_count
: counter used for register names in LLVM IRstr_label
: counter for string labels namessub_label
: counter for sub section of codeelse_label
: counter label for instructions elsecmp_count
: counter of the cmp registers used in the LLVM languagetot_flow_label
: counter for total sub section of code
Grammar start
The grammar starts with the main symbol prog and writes down by stamentsBuff that therefore is displayed in the output file output.ll. The non terminal symbol function_defs is read by first so all the functions definitions are goint to be displayed at the beggining before the @main, at the end of each function definition the var_count is reset so the main function can use the new registers. Between functions and main there are the string declarations to be consequently printed.
prog ::= function_defs {: if(parser.isCorrect) { bwr.write("declare i32 @printf(i8*, ...)\n"); bwr.write(stamentsBuff.toString()); } else System.out.println("Program contains errors."); var_count = 0; stamentsBuff.setLength(0); :}statements {: if(parser.isCorrect) { for(String s : stringStatements) { bwr.write(s+"\n"); } bwr.write("define void @main(){\n"); bwr.write(stamentsBuff.toString()); bwr.write("ret void\n}"); bwr.flush(); bwr.close(); } else System.out.println("There are errors in the program"); :};
Practical examples
Recognition of constants, variable, arrays and matrices ID
In this example it can be seen that when a for or while feature is actived it is displayed the their corresponded labels before any register is be load
val ::= ID:x {: if(!parser.symbolTable.containsKey(x)) { pSemError("Error: Variable "+x+" is not declared."); }else{ RESULT = parser.symbolTable.get(x); //To load the variables inside the "while" block if(activate_while){ tot_sub_label++; sub_label = tot_sub_label; stamentsBuff.append("br label %while_cond." + sub_label+"\n"); stamentsBuff.append("while_cond." + sub_label + ":"+"\n"); activate_while = false; desctivate_while = true; } //To load the variables inside the "for" block if(activate_for){ tot_sub_label++; sub_label = tot_sub_label; stamentsBuff.append("br label %for_cond." + sub_label+"\n"); stamentsBuff.append("for_cond." + sub_label + ":"+"\n"); activate_for = false; } stamentsBuff.append("%"+genVarCount()+" = load "+RESULT.type+" , "+RESULT.type+"* %"+RESULT.reg_id+", align "+RESULT.align+"\n"); RESULT.load_to = Integer.toString(var_count); } :} | ID:x RO arit_op:y RC {: if(!parser.symbolTable.containsKey(x)) { pSemError("Error: Variable "+x+" is not declared."); }else{ RESULT = parser.symbolTable.get(x); if(!y.just_created) stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+RESULT.size1+" x "+RESULT.type+"], ["+RESULT.size1+" x "+RESULT.type+"]* %"+RESULT.reg_id+", "+RESULT.type+" 0, "+RESULT.type+" %"+y.load_to+"\n"); else stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+RESULT.size1+" x "+RESULT.type+"], ["+RESULT.size1+" x "+RESULT.type+"]* %"+RESULT.reg_id+", "+RESULT.type+" 0, "+RESULT.type+" "+(Integer.parseInt(y.value)-1)+"\n"); stamentsBuff.append("%"+genVarCount()+" = load "+RESULT.type+" , "+RESULT.type+"* %"+(var_count-1)+", align "+RESULT.align+"\n"); RESULT.load_to = Integer.toString(var_count); } :} | ID:x RO arit_op:i CM arit_op:j RC {: if(!parser.symbolTable.containsKey(x)) { pSemError("Error: Variable "+x+" is not declared"); }else{ RESULT = parser.symbolTable.get(x); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+RESULT.size1+" x ["+RESULT.size2+" x "+RESULT.type+"]], ["+RESULT.size1+" x ["+RESULT.size2+" x "+RESULT.type+"]]* %"+RESULT.reg_id+", "+RESULT.type+" 0, "+RESULT.type+" "+(i.just_created?Integer.parseInt(i.value)-1:"%"+i.load_to)+"\n"); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+RESULT.size2+" x "+RESULT.type+"], ["+RESULT.size2+" x "+RESULT.type+"]* %"+(var_count-1)+", "+RESULT.type+" 0, "+RESULT.type+" "+(j.just_created?Integer.parseInt(j.value)-1:"%"+j.load_to)+"\n"); stamentsBuff.append("%"+genVarCount()+" = load "+RESULT.type+" , "+RESULT.type+"* %"+(var_count-1)+", align "+RESULT.align+"\n"); RESULT.load_to = Integer.toString(var_count); } :} | INT:x {: RESULT = new InfoVar(x, "i32", new Integer(4)); :} | DOUBLE:x {: RESULT = new InfoVar(x, "double", new Integer(8)); :} ; //Elements of vectors of a matrix matrix_elements ::= matrix_elements:x S vect_elements:y{: x.add(y); RESULT = x; :} | vect_elements:x{: RESULT = new ArrayList<ArrayList<InfoVar>>(); RESULT.add(x); :} ; //Elements of variables or constants of a vector vect_elements ::= vect_elements:x elem:y{: x.add(y); RESULT = x; :} | elem:x {: RESULT = new ArrayList<InfoVar>(); RESULT.add(x); :} ;
Matrix and array definition
Matrices definitions use also the array (vector) definitions since is just a list a of their definitions. In the same way, the definition of the arrays is a list of InfoVar
/Vector | ID:id EQ SO vect_elements:x SC{: InfoVar nInfoVar = new InfoVar(); Integer vector_Register = Integer.parseInt(nInfoVar.reg_id); stamentsBuff.append("%"+vector_Register+" = alloca ["+x.size()+" x "+x.get(0).type+"], align "+x.get(0).align+"\n"); for(int i = 0; i<x.size(); i++) { InfoVar xTy = x.get(i); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+x.size()+" x "+x.get(i).type+"], ["+x.size()+" x "+x.get(i).type+"]* %"+vector_Register+", "+x.get(i).type+" 0, "+x.get(i).type+" "+i+"\n"); stamentsBuff.append("store "+xTy.type+" "+(x.get(i).just_created?x.get(i).value:"%"+x.get(i).load_to)+", "+xTy.type+"* %"+var_count+", align "+xTy.align+"\n"); } nInfoVar.type = x.get(0).type; nInfoVar.align = x.get(0).align; nInfoVar.size1 = x.size(); addSymbol(id, nInfoVar ); :} //Vector element assignment | ID:id RO arit_op:x RC EQ arit_op:y {: InfoVar idVar = parser.symbolTable.get(id); if(!x.just_created) stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+idVar.size1+" x "+idVar.type+"], ["+idVar.size1+" x "+idVar.type+"]* %"+idVar.reg_id+", "+idVar.type+" 0, "+idVar.type+" %"+x.load_to+"\n"); else stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+idVar.size1+" x "+idVar.type+"], ["+idVar.size1+" x "+idVar.type+"]* %"+idVar.reg_id+", "+idVar.type+" 0, "+idVar.type+" "+(Integer.parseInt(x.value)-1)+"\n"); stamentsBuff.append("store "+idVar.type+" "+(y.just_created?y.value:"%"+y.load_to)+", "+idVar.type+"* %"+var_count+", align "+idVar.align+"\n"); :} //Matrix | ID:id EQ SO matrix_elements:x SC{: InfoVar nInfoVar = new InfoVar(); Integer matrix_Register = Integer.parseInt(nInfoVar.reg_id); stamentsBuff.append("%"+matrix_Register+" = alloca ["+x.size()+" x ["+x.get(0).size()+" x "+x.get(0).get(0).type+"]], align "+x.get(0).get(0).align+"\n"); for(int i = 0; i<x.size(); i++) { for(int j = 0; j<x.get(i).size(); j++) { InfoVar xTy = x.get(i).get(j); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+x.size()+" x ["+x.get(i).size()+" x "+xTy.type+"]], ["+x.size()+" x ["+x.get(i).size()+" x "+xTy.type+"]]* %"+matrix_Register+", "+xTy.type+" 0, "+xTy.type+" "+i+"\n"); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+x.get(i).size()+" x "+xTy.type+"], ["+x.get(i).size()+" x "+xTy.type+"]* %"+(var_count-1)+", "+xTy.type+" 0, "+xTy.type+" "+j+"\n"); stamentsBuff.append("store "+xTy.type+" "+(xTy.just_created?xTy.value:"%"+xTy.load_to)+", "+xTy.type+"* %"+var_count+", align "+xTy.align+"\n"); } } nInfoVar.type = x.get(0).get(0).type; nInfoVar.align = x.get(0).get(0).align; nInfoVar.size1 = x.size(); nInfoVar.size2 = x.get(0).size(); addSymbol(id, nInfoVar ); :} //Matrix element assignment | ID:id RO arit_op:i CM arit_op:j RC EQ arit_op:x {: InfoVar idVar = parser.symbolTable.get(id); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+idVar.size1+" x ["+idVar.size2+" x "+idVar.type+"]], ["+idVar.size1+" x ["+idVar.size2+" x "+idVar.type+"]]* %"+idVar.reg_id+", "+idVar.type+" 0, "+idVar.type+" "+(i.just_created?Integer.parseInt(i.value)-1:"%"+i.load_to)+"\n"); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+idVar.size2+" x "+idVar.type+"], ["+idVar.size2+" x "+idVar.type+"]* %"+(var_count-1)+", "+idVar.type+" 0, "+idVar.type+" "+(j.just_created?Integer.parseInt(j.value)-1:"%"+j.load_to)+"\n"); stamentsBuff.append("store "+idVar.type+" "+(x.just_created?x.value:"%"+x.load_to)+", "+idVar.type+"* %"+var_count+", align "+idVar.align+"\n"); :}
Here is an example:
d = [1 2 4 ; 5 6 7]
And here is the LLVM transformation:
%7 = alloca [2 x [3 x i32]], align 4 %8 = getelementptr inbounds [2 x [3 x i32]], [2 x [3 x i32]]* %7, i32 0, i32 0 %9 = getelementptr inbounds [3 x i32], [3 x i32]* %8, i32 0, i32 0 store i32 1, i32* %9, align 4 %10 = getelementptr inbounds [2 x [3 x i32]], [2 x [3 x i32]]* %7, i32 0, i32 0 %11 = getelementptr inbounds [3 x i32], [3 x i32]* %10, i32 0, i32 1 store i32 2, i32* %11, align 4 %12 = getelementptr inbounds [2 x [3 x i32]], [2 x [3 x i32]]* %7, i32 0, i32 0 %13 = getelementptr inbounds [3 x i32], [3 x i32]* %12, i32 0, i32 2 store i32 4, i32* %13, align 4 %14 = getelementptr inbounds [2 x [3 x i32]], [2 x [3 x i32]]* %7, i32 0, i32 1 %15 = getelementptr inbounds [3 x i32], [3 x i32]* %14, i32 0, i32 0 store i32 5, i32* %15, align 4 %16 = getelementptr inbounds [2 x [3 x i32]], [2 x [3 x i32]]* %7, i32 0, i32 1 %17 = getelementptr inbounds [3 x i32], [3 x i32]* %16, i32 0, i32 1 store i32 6, i32* %17, align 4 %18 = getelementptr inbounds [2 x [3 x i32]], [2 x [3 x i32]]* %7, i32 0, i32 1 %19 = getelementptr inbounds [3 x i32], [3 x i32]* %18, i32 0, i32 2 store i32 7, i32* %19, align 4
Function implementation
The following piece of code represents the LLVM IR code of the functions, this only accepts integers parameters and integer returns
function_def ::= FUNCT ID:r EQ ID:f RO parameters:par{: stamentsBuff.append("define i32 @"+f+"("); for(int i = 0; i<par.size(); i++) { genVarCount(); stamentsBuff.append("i32"); if(i != (par.size()-1)) stamentsBuff.append(", "); else stamentsBuff.append(") {"+"\n"); } Integer currentReg; for(int i = 0; i<par.size(); i++) { currentReg = genVarCount() ; stamentsBuff.append("%"+currentReg+" = alloca i32, align 4"+"\n"); stamentsBuff.append("store i32 %"+i+", i32* %"+currentReg+"\n"); InfoVar newParam = new InfoVar(); var_count--; newParam.reg_id = Integer.toString(currentReg); newParam.type = "i32"; newParam.align = 4; addSymbol(par.get(i), newParam); } ArrayList<String> parametersType= new ArrayList<String>(); for(int i = 0; i<par.size(); i++) { parametersType.add("i32"); } InfoFun funct = new InfoFun(parametersType); functionTable.put(f,funct); ret_id = r; :} RC statements END{: stamentsBuff.append("}"+"\n"); var_count = 0; symbolTable.clear(); :}; param ::= ID:x {:RESULT = x;:} | ; parameters ::= parameters:l CM param:x{: l.add(x); RESULT = l; :} | param:x{: RESULT = new ArrayList<String>(); RESULT.add(x); :} ;
Print instruction
There are two print instructions implemented, the first one is “disp” which only displays either string words with the function ManageString or variables (IDs for simple variables, arrays or matrices) with the function ManageStringID; if the ID to be printed is a vector or matrix, this instruction prints the whole structure. The Matlab instruction “fprintf” instead allows (in this implementation) to display string along to the reference of the variables (only single variables).
//Print instruction print_instr ::= DISP RO STRING:x RC{: ManageString(x); :} |DISP RO ID:x RC{: ManageStringID(x); :} | PRINT RO STRING:s CM id_list:x RC{: ManageString(s,x); :} | print_keyw error {:pSynWarning("Error in print instruction.");:} ; id_list ::= id_list:x CM ID:i{: x.add(i); RESULT = x; :} |ID:x{: RESULT = new ArrayList<String>(); RESULT.add(x); :} ;
Here are the three ManageString functions
public void ManageString(String x){ int label = genStrCount(); String s = x; s = s.replace("\"",""); s = s + "\\0A\\00"; Integer length = s.length()-4; parser.stringStatements.add("@.str." + label + " = private constant [" + length + " x i8] c\"" + s + "\", align 1"); stamentsBuff.append(("%" + genVarCount() + " = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([" + length + " x i8], [" + length + " x i8]* @.str." + label + ", i32 0, i32 0))\n")); } public void ManageStringID(String x){ InfoVar infoVar = parser.symbolTable.get(x); if(!parser.symbolTable.containsKey(x)) { pSemError("Variable "+x+" not declared."); }else{ if(infoVar.size1==-1){ int label = genStrCount(); String s = "%"+(infoVar.type.equals("i32")?"d":"f")+"\\0A\\00"; Integer length = s.length()-4; stamentsBuff.append("%"+genVarCount()+" = load "+infoVar.type+", "+infoVar.type+"* %"+infoVar.reg_id+", align "+infoVar.align+"\n"); infoVar.load_to = var_count+""; parser.stringStatements.add("@.str." + label + " = private constant [" + length + " x i8] c\"" + s + "\", align 1"); stamentsBuff.append(("%" + genVarCount() + " = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([" + length + " x i8], [" + length + " x i8]* @.str." + label + ", i32 0, i32 0), "+infoVar.type+ " %"+infoVar.load_to+")\n")); }else if(infoVar.size1!=1 && infoVar.size2==-1){ int label = genStrCount(); String s = ""; ArrayList<Integer> loads_reg = new ArrayList<>(); for(int i = 0;i < infoVar.size1-1; i++){ s = s+" %"+(infoVar.type.equals("i32")?"d":"f"); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+infoVar.size1+" x "+infoVar.type+"], ["+infoVar.size1+" x "+infoVar.type+"]* %"+infoVar.reg_id+", "+infoVar.type+" 0, "+infoVar.type+" "+i+"\n"); stamentsBuff.append("%"+genVarCount()+" = load "+infoVar.type+" , "+infoVar.type+"* %"+(var_count-1)+", align "+infoVar.align+"\n"); loads_reg.add(var_count); } s = s+" %"+(infoVar.type.equals("i32")?"d":"f") + "\\0A\\00"; stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+infoVar.size1+" x "+infoVar.type+"], ["+infoVar.size1+" x "+infoVar.type+"]* %"+infoVar.reg_id+", "+infoVar.type+" 0, "+infoVar.type+" "+(infoVar.size1-1)+"\n"); stamentsBuff.append("%"+genVarCount()+" = load "+infoVar.type+" , "+infoVar.type+"* %"+(var_count-1)+", align "+infoVar.align+"\n"); loads_reg.add(var_count); Integer length = s.length()-4; parser.stringStatements.add("@.str." + label + " = private constant [" + length + " x i8] c\"" + s + "\", align 1"); stamentsBuff.append(("%" + genVarCount() + " = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([" + length + " x i8], [" + length + " x i8]* @.str." + label + ", i32 0, i32 0)")); stamentsBuff.append(", "); for (int i = 0; i < loads_reg.size(); i ++) { if(i==0) stamentsBuff.append(infoVar.type+" %"+loads_reg.get(i)); else stamentsBuff.append(", "+infoVar.type+" %"+loads_reg.get(i)); } stamentsBuff.append(")"+"\n"); }else{ for(int i = 0;i < infoVar.size1; i++){ int label = genStrCount(); String s = ""; ArrayList<Integer> loads_reg = new ArrayList<>(); for(int j = 0;j < infoVar.size2; j++){ s = s+" %"+(infoVar.type.equals("i32")?"d":"f"); if(j== infoVar.size2-1) s = s+"\\0A\\00"; stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+infoVar.size1+" x ["+infoVar.size2+" x "+infoVar.type+"]], ["+infoVar.size1+" x ["+infoVar.size2+" x "+infoVar.type+"]]* %"+infoVar.reg_id+", "+infoVar.type+" 0, "+infoVar.type+" "+i+"\n"); stamentsBuff.append("%"+genVarCount()+" = getelementptr inbounds ["+infoVar.size2+" x "+infoVar.type+"], ["+infoVar.size2+" x "+infoVar.type+"]* %"+(var_count-1)+", "+infoVar.type+" 0, "+infoVar.type+" "+j+"\n"); stamentsBuff.append("%"+genVarCount()+" = load "+infoVar.type+" , "+infoVar.type+"* %"+(var_count-1)+", align "+infoVar.align+"\n"); loads_reg.add(var_count); } Integer length = s.length()-4; parser.stringStatements.add("@.str." + label + " = private constant [" + length + " x i8] c\"" + s + "\", align 1"); stamentsBuff.append(("%" + genVarCount() + " = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([" + length + " x i8], [" + length + " x i8]* @.str." + label + ", i32 0, i32 0)")); stamentsBuff.append(", "); for (int j = 0; j < loads_reg.size(); j ++) { if(j==0) stamentsBuff.append(infoVar.type+" %"+loads_reg.get(j)); else stamentsBuff.append(", "+infoVar.type+" %"+loads_reg.get(j)); } stamentsBuff.append(")"+"\n"); } } } } public void ManageString(String x, ArrayList<String> variables) { ArrayList <InfoVar> regList = new ArrayList<InfoVar>(); int label = genStrCount(); InfoVar t = null; String s = x; s = s.replace("\"", ""); s = s.replace("%i", "%d"); for(String var : variables) { t = parser.symbolTable.get(var); if(!parser.symbolTable.containsKey(var)) { pSemError("Variable "+var+" not declared."); }else if(parser.symbolTable.get(var).size1==-1){ stamentsBuff.append("%"+genVarCount()+" = load "+t.type+", "+t.type+"* %"+t.reg_id+", align "+t.align+"\n"); t.load_to = var_count+""; regList.add(t); } } s = s + "\\0A\\00"; Integer length = s.length()-4; parser.stringStatements.add("@.str." + label + " = private constant [" + length + " x i8] c\"" + s + "\", align 1"); stamentsBuff.append(("%" + genVarCount() + " = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([" + length + " x i8], [" + length + " x i8]* @.str." + label + ", i32 0, i32 0)")); stamentsBuff.append(", "); for (int i = 0; i < regList.size(); i ++) { InfoVar infoVar = regList.get(i); if(i==0) stamentsBuff.append(infoVar.type+" %"+infoVar.load_to); else stamentsBuff.append(", "+infoVar.type+" %"+infoVar.load_to); } stamentsBuff.append(")"+"\n"); }
Error handling
The compiler is able to recognize the following kind of errors:
- Variable not declared
- Variable is not an array
- Function not defined
- Generic error in assignment
- Missing ] in array definition
- General error in while condition
- Missing ) in while condition
- General error in if condition
Missing functionalities, partial implementations
- the
disp()
function only display strings or a single variable, not strings with variables nor or multiple variables fprintf()
only print ID from simple variables not ids from arrays or matrices, butdisp()
function does- functions only return and accept parameters with type i32 (integers)
- if and while condition only allows AND conditions, OR conditions are not generated properly
- No support for global variables
- Strings cannot be assigned to a variable
- Due to a reduce/shift conflict between reference from matrix element and function call (both with this syntax:
ID(arit_op, arit_op))
, function calls have an additional “()” so that function_call can be recognized properly
Download and Parser
Compiler matlab_compiler.zip
Examples
- Function example and arithmetic operations: function.zip
- Array and matrix operations: matrix_operations.zip
- Bubble sort: bubble_sort.zip
How to run it
- Install the llvm package
sudo apt install llvm
- Download the matlab_compiler and unzip it
- Start a new terminal inside the source folder and run the following commands:
jflex matlab_scanner.jflex
java java_cup.Main -expect 3 matlab_parser.cup
javac *.java
java Main source.mlx
- This will produce an
output.ll
file - Run output.ll file with:
lli output.ll
References
- Templates and code structure were taken from this section.
If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it
You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/matlab_to_llvm?do=