Return to Home page

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== From Haskell to LLVM ======
+===== Introduction ====
+In this presentation, we illustrate the LHC (Little Haskell Compiler), which can recognize a strict subset of the Haskell Programming Language, build it and produce LLVM code as output. \\ In the [[haskell_to_llvm#Installation and usage|last section]], you can find how to install and use the tool.
+=== The High-end Language ===
+[[https://www.haskell.org/|Haskell]] is an advanced, Purely Functional Programming Language. Born in 1990, it has become one of the most popular Functional Programming Languages. Typically, Functional Languages are more confined to the academic context as developing programs is more complex than using an Imperative Language (C, Java, Python). Nonetheless, Haskell is increasingly used in the industrial domain because of its useful properties, e.g. it is much easier to prove Correctness.
+Being a Pure Functional Language, Haskell has many interesting properties:
+  * Referential Transparency
+  * Single Assignment
+  * Non-Strictness
+  * Type Inference
+  * Functions are 1st order entities
+  * Lexical Scoping
+  * ...
+LHC tries to follow in the footsteps of this Language by implementing some of its peculiar features (Referential Transparency, Single Assignment, Lexical Scoping).
+=== The Core ===
+LHC is based on:
+  * a [[haskell_to_llvm#Scanner|Scanner]], built using a Scanner Generator: [[https://www.jflex.de/|JFlex]]
+  * an LALR(1) [[haskell_to_llvm#Parser|Parser]], designed using a Parser Generator: [[http://www2.cs.tum.edu/projects/cup/|Cup]]
+  * additional classes for [[haskell_to_llvm#Semantic Analysis|Semantic Analysis]] and [[haskell_to_llvm#IR Generation|IR Generation]], written in Java
+=== The Target Language ===
+The Target Language of LHC is [[https://llvm.org/|LLVM]], a Programming Language specially used for Intermediate Code Representation.
+Born as a research project at the University of Illinois, it has developed into an umbrella project consisting of several subprojects, including the LLVM Language itself and a C++ API to build and optimize LLVM code at a higher level of abstraction.
+==== Features ====
+Currently, LHC supports the following subset of Haskell:
+  * Supported Types:
+    * Basic Types: Int, Double, Char, Bool
+    * Aggregate Types: Lists (not Recursive), Strings
+    * Other Types: Functions (not as 1st order entities)
+  * Supported operations:
+    * Addition (+)
+    * Subtraction (-)
+    * Multiplication (<nowiki>*</nowiki>)
+    * Division (/)
+    * Integer Division (div)
+    * Remainder (rem)
+    * Logical operators:
+      * Logical and (&&)
+      * Logical or (||)
+      * Logical not (not)
+    * Relational operators:
+      * Greater than (>)
+      * Greater or equal to (<nowiki>>=</nowiki>)
+      * Less than (<)
+      * Less or equal to (<nowiki><=</nowiki>)
+      * Equal to (<nowiki>==</nowiki>)
+      * Not equal to (<nowiki>/=</nowiki>)
+    * Element extraction from List/String (!!)
+    * Function call
+  * Constructs:
+    * If-Then-Else (the only possible Control Flow Statement: loops do not exist in Haskell)
+    * Let Bindings (both in the Functional and Imperative Part)
+    * Do Block
+  * I/O:
+    * Print Function (accepts Int, Double, Char, String)
+  * Language Properties:
+    * Indentation-based Parsing
+    * Referential Transparency
+    * Single Assignment ("Everything is a constant")
+    * Lexical Scoping
+=== Partial/Missing Features ===
+  * Functions cannot return Lists/Strings
+  * Global Lists/Strings are not supported
+===== The Compiler =====
+**Note**: all debug code is not shown at all; we use "[...]" for meaningful parser code that is covered in other sections to focus on what is most important.
+==== Scanner ====
+In this section, we illustrate how Tokens are scanned in LHC using a Scanner written in JFlex.
+=== General Scanning Algorithm ===
+The Scanning Algorithm implemented in LHC uses a FIFO ''tokenQueue'' where all tokens are inserted once scanned. All Tokens are sent to the Parser as Symbols to provide additional information for Semantic Analysis.
+In particular, every time that a Token is requested by the Parser:
+  - if the ''tokenQueue'' is not empty, return the next element of the queue (in the pre-scanning method ''next_token_custom()'')
+  - otherwise:
+    - look for the next token(s) (launches the default scanning method ''next_token()'', called by the Parser by default)
+    - manage the token(s) (in the post-scanning method ''manageToken()'')
+    - return the first next token to the Parser (in ''manageToken()'', too)
+Managing basic Tokens, i.e. all those not related to indentation, only consists of inserting them into the ''tokenQueue''. For indentation-related token management, read [[haskell_to_llvm#Indentation-based Parsing|this section]].
+**About code**: when a method is reported, it assumes a form like ''method_name()'', __regardless__ of the number of arguments.
+Snippet of ''next_token_custom()'':
+<code java>
+public Symbol next_token_custom() throws IOException{
+	// if the token Queue is not empty, extract the next item
+	if (this.tokenQueue.size() > 0) {
+		return this.tokenQueue.remove();
+	}
+	[...] // Additional code for Indentation-based Parsing, we will cover this later
+	return this.next_token();
+}
+</code>
+Snippet of ''manageToken()'':
+<code java>
+private Symbol manageToken(Symbol... symbols) {
+        [...] // Additional code for Indentation-based Parsing
+	for (Symbol sym: symbols) {
+		tokenQueue.add(sym);
+	}
+	return this.tokenQueue.remove();
+}
+</code>
+It is always up to the Parser to ask the Scanner for a new token; by default, the Parser calls the ''next_token()'' method of the Scanner and a new Token is received eventually. In this case, there is a pre-scanning method to be executed: the ''scan with'' code section of the Parser needs to be tuned accordingly:
+<code java>
+scan with {:
+	if (true) { // Required because of a bug in this enhanced implementation of Cup
+		Scanner scanner = (Scanner) this.getScanner();
+		Symbol s = scanner.next_token_custom();
+		[...] // Additional code for tracking current line and column and for drawing the Parse Tree
+		return s;
+	}
+:}
+</code>
+=== Basic Token management ===
+The Scanner can recognize several keywords and regular expressions that identify specific terminals of the Grammar. \\
+Here is a snippet of the Scanner code for identifying those Tokens:
+<code>
+int = 0|[1-9][0-9]* 					//unsigned int
+rational = {int}\.[0-9]+				//unsigned rational
+expform = {rational}[eE]-?{int}
+double = {rational}|{expform}
+bool = True|False
+char = \'.\'
+string = \"~\"
+id = [a-z][a-zA-Z0-9_\']*
+nl = \r|\n|\r\n
+ws = [ \t]
+%%
+"="			{return manageToken(createSymbol(sym.eq));}
+"::"			{return manageToken(createSymbol(sym.clns));}
+","			{return manageToken(createSymbol(sym.cm));}
+"("			{return manageToken(createSymbol(sym.ro));}
+")"			{return manageToken(createSymbol(sym.rc));}
+"["			{return manageToken(createSymbol(sym.bo));}
+"]"			{return manageToken(createSymbol(sym.bc));}
+"->"			{return manageToken(createSymbol(sym.arrow));}
+"+"			{return manageToken(createSymbol(sym.plus));}
+"-"			{return manageToken(createSymbol(sym.minus));}
+"*"			{return manageToken(createSymbol(sym.times));}
+"/"			{return manageToken(createSymbol(sym.div));}
+div			{return manageToken(createSymbol(sym.intdiv));}
+rem			{return manageToken(createSymbol(sym.rem));}
+"&&"			{return manageToken(createSymbol(sym.and));}
+"||"			{return manageToken(createSymbol(sym.or));}
+not			{return manageToken(createSymbol(sym.not));}
+"/="			{return manageToken(createSymbol(sym.relnoteq));}
+"=="			{return manageToken(createSymbol(sym.releq));}
+">="			{return manageToken(createSymbol(sym.relge));}
+">"			{return manageToken(createSymbol(sym.relgt));}
+"<="			{return manageToken(createSymbol(sym.relle));}
+"<"			{return manageToken(createSymbol(sym.rellt));}
+";"			{return manageToken(createSymbol(sym.sep));}
+in			{[...]
+			 return manageToken(createSymbol(sym.in));}
+main			{return manageToken(createSymbol(sym.main));}
+do			{[...]
+			 return manageToken(createSymbol(sym.do_begin));}
+"!!"			{return manageToken(createSymbol(sym.index));}
+if			{return manageToken(createSymbol(sym.if_begin));}
+then			{return manageToken(createSymbol(sym.then));}
+else			{return manageToken(createSymbol(sym.else_begin));}
+let			{[...]
+			 return manageToken(createSymbol(sym.let));}
+print			{return manageToken(createSymbol(sym.print));}
+Int			{return manageToken(createSymbol(sym.type_int));}
+Double			{return manageToken(createSymbol(sym.type_double));}
+Bool			{return manageToken(createSymbol(sym.type_bool));}
+Char			{return manageToken(createSymbol(sym.type_char));}
+String			{return manageToken(createSymbol(sym.type_string));}
+{int}			{return manageToken(createSymbol(sym.val_int, 	 Integer.valueOf(yytext())));}
+{double}		{return manageToken(createSymbol(sym.val_double, Double.valueOf(yytext())));}
+{bool}			{return manageToken(createSymbol(sym.val_bool, 	 Boolean.valueOf(yytext())));}
+{char}			{return manageToken(createSymbol(sym.val_char, 	 yytext().charAt(1)));}
+{string}		{return manageToken(createSymbol(sym.val_string, yytext().substring(1, yytext().length()-1)));}
+//LLVM does not support the single quote as a valid character for identifiers, while dots are: for this reason, every single tick is replaced with a dot
+{id}			{return manageToken(createSymbol(sym.id,         yytext().replace('\'', '.')));}
+{ws}			{;}
+{nl}			{[...]}
+/* Single-line Comment */
+"--" ~ {nl}		{[...]}
+</code>
+=== Indentation-based Parsing ===
+For nested blocks, Haskell supports both a coding style that uses brackets and an indentation-based one, which LHC supports.
+However, Haskell uses [[wp>Off-side_rule|Off-side rules]] only for certain Blocks like Let Bindings and Do Blocks, while others are in free-form, as the If-Then-Else block.
+How can we recognize Indentation? \\
+A possible approach, used in Python, consists of defining additional tokens on purpose: Indent, Dedent and Separator; Indent starts a block, Dedent terminates it, Separator divides statements inside the same one. \\
+We also define additional variables and data structures:
+  * an ''indentStack'', where each entry contains the column index of a block
+  * an ''indentEnable'' flag, raised when an indentation-sensitive block is recognized, i.e. a "let" or "do" token is scanned
+  * a ''scanIndent'' flag, raised when ''indentEnable'' is active, so that the next Token will define the column index for the block
+  * a ''foundNewline'' flag, raised when a newline is recognized
+  * a ''dedentForce'' flag, to force a decent for the very peculiar case in which we scan an "in" Token.
+The concept is the following: when an indentation-sensitive block is recognized, i.e. its keyword is scanned ("let", "do"), an Indent is scanned; ''indentEnable'' is raised. Before scanning the following Token, ''scanIndent'' is raised, too: the Token will define the column index for the block, which is stored in the ''indentStack''. \\
+__Beware__: nested blocks can only be more to the right than the current one.
+In case we are in an indentation-sensitive block:
+  * if a newline is recognized and the following line is more to the left than the current block, a Dedent is scanned: the block is closed; actually, this is a loop: we may scan multiple Dedents at once. The loop halts when a block is aligned more to the left than the line. In the particular case where a Dedent is scanned because ''dedentForce'' was raised, only one Dedent is accepted, i.e. the Let Binding corresponding to the "in" is closed, nothing else
+  * if the following and the current line are aligned the same, then a Separator is scanned
+  * if the following line is more to the right than the block, no additional symbol is scanned: we are still in that block
+Once defined the principle, we can illustrate the code.
+Here are the additional instructions for indentation control, for specific Tokens:
+<code>
+in			{dedentForce = true;
+			 return manageToken(createSymbol(sym.in));}
+[...]
+do			{indentEnable = true;
+			 return manageToken(createSymbol(sym.do_begin));}
+[...]
+let			{indentEnable = true;
+			 return manageToken(createSymbol(sym.let));}
+[...]
+{nl}			{foundNewline = true;}
+/* Single-line Comment */
+"--" ~ {nl}		{foundNewline = true;}
+</code>
+Code snippet to enable ''scanIndent'' (in ''next_token_custom()''):
+<code java>
+if (indentEnable) {
+	indentEnable = false;
+	scanIndent = true;
+}
+</code>
+Code snippet to scan a new Indent, define the column level of the nested block and store it (in ''manageToken()''):
+<code java>
+if (!endOfCode && scanIndent) {
+	indentColumn = yycolumn+1;
+	if (indentStack.size() == 0 || indentColumn > indentStack.peek()) {
+		indentStack.push(indentColumn);
+		tokenQueue.add(createSymbol(sym.indent));
+	}
+	else {
+		report_error("Nested block is not indented further in than the enclosing expression");
+		tokenQueue.add(createSymbol(sym.error));
+	}
+	scanIndent = false;
+	foundNewline = false;
+}
+</code>
+Code snippet to scan one or more Dedents (in ''manageToken()''):
+<code java>
+if (!endOfCode && !indentStack.empty() && (foundNewline || dedentForce)) {
+	dedentColumn = yycolumn + 1;
+	boolean thereIsDedentToInsert = true;
+	while (thereIsDedentToInsert && !indentStack.empty()) {
+		indentColumn = indentStack.peek(); 			// take the column of the innermost block
+		if (indentColumn > dedentColumn || dedentForce) { 	// check if we are out of the innermost block
+			dedentForce = false;
+			indentStack.pop();
+			[...]
+			tokenQueue.add(createSymbol(sym.dedent));
+		}
+		else {
+			thereIsDedentToInsert = false;
+			if (indentColumn.equals(dedentColumn)) {
+				// the following statement is not outside of the block : add only a Separator
+				tokenQueue.add(createSymbol(sym.sep));
+			}
+		}
+	}
+}
+</code>
+=== Error Handling ===
+If no Token can be recognized, a Lexical Error is notified, calling the ''report_error()'' method:
+<code java>
+public void report_error(String msg) {
+	System.err.print("ERROR: Lexical error");
+	System.err.print(" (line " + (yyline+1) + ", column " + (yycolumn+1) + "): " + msg);
+	System.err.println(" (Token \"" + yytext() + "\")");
+}
+</code>
+If no Token can be recognized, then the text matches with the following catch-all regular expression, defined at the end of the code:
+<code>
+.		{report_error("Not recognized token");
+		 return this.manageToken(createSymbol(sym.error));}
+</code>
+==== Parser ====
+The Parser, written in Cup, recognizes the syntax of the program and handles possible occurring errors. \\
+Here is the full Grammar of the Language recognized by LHC (Semantic Rules are explored in [[haskell_to_llvm#Semantic Analysis|this Section]]).
+<code>
+PROGRAM ::= /* empty program */
+	  | indent FUNCT_PART IMPER_PART dedent
+/// ////////////////// ///
+///  IMPERATIVE PART   ///
+/// ////////////////// ///
+IMPER_PART ::= main eq IO_ACTION
+IO_ACTION ::= PRINT
+            | DO_BLOCK
+	    | IF_BLOCK_IMPER
+IO_ACTIONS ::= IO_ACTIONS sep IO_ACTION
+	     | IO_ACTIONS sep LET_BLOCK_IMPER
+	     | IO_ACTION
+	     | LET_BLOCK_IMPER
+PRINT ::= print ACTARG
+DO_BLOCK ::= do_begin indent IO_ACTIONS dedent
+IF_BLOCK_IMPER ::= if_begin COND then IO_ACTION else_begin IO_ACTION
+LET_BLOCK_IMPER ::= let indent LET_STMTS dedent
+LET_STMTS ::= LET_STMTS sep DECL_TYPE
+	    | LET_STMTS sep DEF_VALUE
+	    | DECL_TYPE
+	    | DEF_VALUE
+/// ////////////////// ///
+///  FUNCTIONAL PART   ///
+/// ////////////////// ///
+FUNCT_PART ::= GLOBAL_STMTS
+GLOBAL_STMTS ::= /* empty List of Statements section */
+	       | GLOBAL_STMTS STMT sep
+///  STATEMENTS  ///
+STMT ::= DECL_TYPE
+       | DEF_VALUE
+       | DEF_FUNCT
+DECL_TYPE ::= id cm DECL_TYPE
+	    | id clns TYPE
+DEF_VALUE ::= id eq EXPR
+/* no nullary functions */
+DEF_FUNCT ::= id LFORMARG eq EXPR
+LFORMARG ::= LFORMARG id
+           | id
+/* special expression management for boolean conditions */
+COND ::= EXPR
+///   EXPRESSIONS   ///
+EXPR ::=     EXPR plus EXPR
+           | EXPR minus EXPR
+           | EXPR times EXPR
+	   | EXPR div EXPR
+	   | EXPR intdiv EXPR
+	   | EXPR rem EXPR
+	   | EXPR and EXPR
+	   | EXPR or EXPR
+	   | EXPR relnoteq EXPR
+	   | EXPR releq EXPR
+	   | EXPR relgt EXPR
+	   | EXPR relge EXPR
+	   | EXPR rellt EXPR
+	   | EXPR relle EXPR
+	   | EXPR index EXPR
+	   | ro EXPR rc
+	   | not EXPR
+	   | minus EXPR %prec uminus
+	   | LET_BLOCK_FUNC
+	   | IF_BLOCK_FUNC
+	   | FUNCT_CALL
+	   | VALUE
+	   | EXPR_LIST
+FUNCT_CALL ::= id LACTARG
+LET_BLOCK_FUNC ::= let indent LET_STMTS dedent in EXPR
+IF_BLOCK_FUNC ::= if_begin COND then EXPR else_begin EXPR
+///    args for function call   ///
+ACTARG ::= id
+         | VALUE
+	 | ro EXPR rc
+// list of input arguments for function call
+LACTARG ::= /* empty list of args: we call a Value */
+	  | LACTARG ACTARG
+///   VALUES   ///
+// Represents constants
+VALUE ::= VALUE_BASIC
+// VALUE_BASIC represents constants
+VALUE_BASIC ::=
+	    val_int
+	  | val_double
+	  | val_bool
+	  | val_char
+	  | val_string
+EXPR_LIST ::= bo LEXPR bc
+LEXPR ::= EXPR
+	| LEXPR cm EXPR
+///   TYPES   ///
+TYPE ::= TYPE_VALUE
+       | TYPE_FUNC
+// both basic types and compound
+TYPE_VALUE ::= TYPE_LIST
+	     | TYPE_BASIC
+TYPE_BASIC ::= type_int
+             | type_double
+             | type_bool
+	     | type_char
+TYPE_LIST ::= bo TYPE_BASIC bc
+            | type_string
+// no higher order functions
+TYPE_FUNC ::= TYPE_VALUE arrow TYPE_FUNC
+	    | TYPE_VALUE arrow TYPE_VALUE
+</code>
+To avoid Shift-Reduce conflicts, operators have given precedence, shown below:
+<code>
+/// ///////////////////////// ///
+///    PRECEDENCE RULES	      ///
+/// ///////////////////////// ///
+precedence left or;
+precedence left and;
+precedence nonassoc releq, relnoteq, relgt, relge, rellt, relle;
+precedence left plus, minus;
+precedence left times, div, intdiv, rem;
+precedence nonassoc index;
+precedence nonassoc not, uminus;
+</code>
+We can notice that Haskell has some peculiarities, especially in comparison with typical imperative languages, like Java.
+Any Haskell program can be split into two parts: a functional part and an imperative one; the latter corresponds to the main and is the only region of the program where it is possible to access I/O devices via IO Actions; in general, it is where we can modify the state of the machine. Instead, all functions in the first part are stateless: this property is fundamental for making Haskell a Purely Functional programming language and is also known as [[wp>Referential_transparency|referential transparency]].
+To declare local variables, the language defines Let Bindings that can be placed inside the function body; here, they are represented by ''LET_BLOCK_FUNC'' and ''LET_BLOCK_IMPER''.
+Strangely, the If-Then-Else construct and Let Bindings defined in the functional part are Expressions. In particular, If-Then-Else is an expression whose return value corresponds to that of the Then/Else Branch; since expressions have always one and only one type, the two branches must return a value of the same type. Instead, Let Bindings have the same type and return value of the expressions following the ''in'' token (the expression is part of the construct).
+In Haskell, the If-Then-Else construct imposes to define both the branches; this is due to a particular rule of the language:
+>> Expressions must always return something
+The If-Then-Else construct is an expression, so it must always return something __regardless of the result of the condition__: every If must define an expression for both the branches.
+Calling a value is like calling a nullary function, which is how Haskell interprets values, actually.
+An n-ary function, i.e. a function taking n arguments, is equivalent to a sequence of unary functions taking one argument at a time. This concept is known as [[wp>Currying|currying]] and is extensively used in Haskell for defining higher-order functions.
+Finally, strings are lists of chars: there exists no Array type.
+=== Error Handling ===
+The Parser is also able to recognize syntax errors and recover them, in some cases. For this purpose, Cup allows using an ''error'' token, for error recovery.
+Every time an error is detected, the Parser launches the ''syntax_error()'' method by default, which forwards the error to ''report_error()'', whose body is reported here:
+<code java>
+public void report_error(Object info) {
+        noCompileErrors = false;
+    System.err.print("ERROR: Syntax error");
+    if (info instanceof Symbol)
+        if (((Symbol)info).left != -1){
+            int line = (((Symbol)info).left);
+            int column = (((Symbol)info).right);
+            System.err.print(" (line "+line+", column "+column+"): ");
+        } else System.err.print(": ");
+    else System.err.print(": ");
+}
+</code>
+The ''noCompileErrors'' flag is true at the beginning and set to ''false'' when the first error of any kind occurs.
+We can observe that ''report_error()'' builds an incomplete error message: no information about the kind of error has been provided since the error has not been recovered yet! \\
+To complete the message, we need to wait until the Parser recovers an error, i.e. reduces a rule containing the ''error'' token. At that point, the ''report_syntax_error()'' is called and the error is diagnosed.
+These are the additional rules, included in the compiler, to handle some syntax errors:
+<code>
+IMPER_PART ::= main eq error
+IO_ACTIONS ::= IO_ACTIONS:io_actions sep error
+DO_BLOCK ::= do_begin indent error dedent
+IF_BLOCK_IMPER ::=
+            if_begin error then IO_ACTION else_begin IO_ACTION
+	  | if_begin COND then error else_begin IO_ACTION
+	  | if_begin COND error else_begin IO_ACTION
+GLOBAL_STMTS ::= GLOBAL_STMTS error sep
+LET_BLOCK_FUNC ::= let error in EXPR
+IF_BLOCK_FUNC ::= if_begin error else_begin EXPR
+		| if_begin COND then error else_begin EXPR
+		| if_begin COND error else_begin EXPR
+</code>
+This is the list of possible syntax errors that LHC can recognize:
+  * Error in Main
+  * Error in Imperative Part Statement
+  * Error in Do Block
+  * Error in Condition of If-Then-Else block
+  * Error in Then Block of If-Then-Else block
+  * Missing 'then' in If-Then-Else block
+  * Error in Functional Statement
+  * Error in Statement inside a Let Block
+ ==== Semantic Analysis ====
+Together with lexicon and syntax, semantics is one of the main components of a program. Thanks to semantics, it is possible to define context-sensitive information, like variable scoping and typing, empowering the language.
+**Note**:
+In Cup, it is possible to define intermediate actions, like:
+A ::= x { something } y
+which is equivalent to
+A ::= M y
+M ::= x { something }
+In particular, Cup can define implicit rules for the markers and translate the grammar from the first representation to the second one; however, this may raise conflicts if the parser cannot be sure that it will reduce exactly that rule.
+=== The typing system ===
+Like any other programming language, Haskell has a typing system. It is very complex and uses very high-level concepts as typeclasses, polymorphic data structures, type variables, the kind system, and so on. \\
+In LHC, only pretty simple types are supported: Int, Double, Bool, Char, not recursive List, Strings and Functions.
+Each of these types has a standard representation as a type tree. Basic types are represented as a simple tree node, whose label corresponds to the name of the type itself. For aggregate types, it is different: Lists are polymorphic, so they are represented as a root node and a child node, holding the type of the elements of the list; strings are simple lists of chars. For functions, it is even more interesting: thanks to [[wp>Currying|currying]], we can consider functions as a sequence of unary functions. \\
+So, we can represent them as a binary tree where:
+  * the root node is labelled as "Function"
+  * the left child holds the type of the first argument
+  * the right child holds the type of the current function without the first argument: the tree depth is equal to the number of arguments
+In this compiler, type trees are represented by the ''Type'' class, composed of the following methods and fields:
+  * constructors, setters, getters;
+  * ''typeName'' (the label of the root node);
+  * ''typeParams'' (the child nodes of the tree);
+  * ''typeMap'': the type map contains all the "standard" representations of Types, the same described some line above; it is used as a sort of lookup table for type trees;
+  * ''widen()'': method for type widening, i.e. it widens Int to Double if required
+  * ''isSubtype()'': a method for checking subtypes; it takes the name of a certain type as argument and checks if the current type is a subtype of the argument
+  * ''isEquivalent()'': method for type equivalence checking; it works as ''isSubtype()'', but checks equivalence
+  * ''returnType()'': given the type tree of a function, returns its return type
+  * ''hasArity()'': a method for arity checking, i.e. verifies whether a function takes a given number of arguments, provided as argument of the method
+  * ''createTypeMap()'': creates the type map during the parser initialization
+  * ''addType()'': adds a type to the type map
+  * ''typeExists()'': checks if a type is defined in the type map, i.e. if it is recognized by the language
+  * ''getType()'': gets the type tree of a given type from the type map
+  * ''dumpTypeMap()'', ''dumpType()'': used for debugging purposes
+In the type map, two additional types are used:
+  * ''Any'', for polymorphic types like functions and lists
+  * ''error'', for error recovery
+=== The symbol table stack ===
+During semantic analysis, one of the most important data structures is the symbol table: it contains all the information about declared values and functions, providing all those context-sensitive functionalities that empower the language.
+Every symbol table is a hashmap containing many symbol table entries. Each symbol table entry has a name, a type and a flag to check whether it has been already assigned.
+In this compiler, we do not use a unique symbol table; instead, we use a symbol table stack: in LHC, it is possible to define nested blocks with different contexts, e.g. nested let bindings.
+The symbol table stack is implemented via the ''SymTableStack'' class, which defines an inner ''SymTableEntry'' class for the symbol table entries.
+The ''SymTableEntry'' class defines objects with the following information:
+  * ''type'': the type of the value/function, as a type tree;
+  * ''isAssigned'': boolean flag for checking assignment; initially, it is false
+Thanks to this flag, values and functions can be assigned only once. Single assignment is one peculiar characteristic of functional language, where values and functions are intended in a mathematical way. If you define cos(x) as a certain function, why should you define it again?
+Where is the entry name? Since a symbol table is a hashmap, we use the name of the entry as a key and a ''SymTableEntry'' instance as its value.
+This is the information contained in a ''SymTableStack'' instance:
+  * ''symTableStack'': a __LinkedList__ containing the stack of symbol tables itself; it is not defined as a stack on purpose: there are cases where it is necessary to look in the symbol tables below (as explained for ''getEntry()'');
+  * ''pushSymTable()'': push a new symbol table in the stack;
+  * ''popsymTable()'': pop a symbol table from the stack;
+  * ''peekSymTable()'': get the symbol table on top of the stack, without popping it;
+  * ''putEntry()'': insert an entry in the symbol table on top of the stack;
+  * ''containsEntry()'': check if a value/function is present in the whole stack; search starts at the uppermost symbol table and ends at the bottom;
+  * ''getEntry()'': looks for an entry like ''containsEntry()'' and returns it, if present
+  * ''getLevel()'': looks for an entry like ''containsEntry()'' and returns its level, where level is a positive integer and 0 corresponds to the bottom
+  * ''isGlobal()'': checks whether a value/function is global, i.e. declared in the bottom symbol table
+  * ''isLocallyDeclared()'': checks whether a value/function has been declared in the top-level symbol table
+  * ''isAssigned()'': checks if a variable has already been assigned; used for assignment uniqueness check
+  * ''isDeclared()'': equivalent of ''containsEntry()''
+  * additional fields and methods for debug and IR Generation (explored [[haskell_to_llvm#IR Generation|here]])
+In the Parser, the symbol table stack is allocated here:
+<code java>
+action code {:
+	/* The Symbol Table Stack to be used */
+	private SymTableStack symTableStack = new SymTableStack();
+:};
+</code>
+=== The AST ===
+The [[wp>Abstract_syntax_tree|abstract syntax tree]] is a tree representation of the abstract syntactic structure of the source code. Each node of the tree denotes a construct occurring in the text. If we define the programming language in Backus-Naur form as Cup allows to, then each symbol of the grammar represents a node of the tree; in particular, the node is a leaf if the symbol is terminal. \\
+AST is very useful for top-down IR generation, as reported [[haskell_to_llvm#IR Generation|here]].
+In this compiler, the AST nodes are defined in the wrapper ''AST'' class, which contains all of them. It contains many different nodes, in practice one per non-terminal symbol:
+  * ''BasicNode'': represents any node of the AST; any other node is subtype of ''BasicNode''; this class defines several methods and fields for code generation, as shown later;
+  * ''Program'': the root of all the syntax tree, represents the whole program
+  * ''FunctPart'': represents the functional part of the program
+  * ''ImperPart'': represents the imperative part of the program
+  * ''DoStmt'': abstract class, parent class of all accepted statements inside a do block
+  * ''IOAction'': abstract class, parent class of all nodes that are also IO actions
+  * ''DoBlock'': represents a do block
+  * ''IfBlockImper'': represents an if-then-else in the imperative part
+  * ''LetBlockImper'': represents a let binding in the imperative part
+  * ''Stmt'': abstract class, parent of all statements, i.e. declarations and definitions
+  * ''DeclType'': represents a type declaration
+  * ''DefValue'': represents a value definition
+  * ''DefFunct'': represents function definition
+  * ''BasicExpr'': abstract class, parent class of all sorts of expressions; it also holds a type
+  * ''Expr'': class for almost all expressions; basically, all operations are represented by this node
+  * ''LetBlockFunc'': expression node for a let binding in the functional part
+  * ''IfBlockFunc'': expression node for an if-then-else in the functional part
+  * ''FunctCall'': expression node for function calls
+  * ''Value'': expression node for constants
+  * ''ExprList'': expression node for a list of expressions
+  * ''LFormArg'': node containing the list of all formal arguments of a function
+Now that we have defined the AST nodes, we may ask how they are interconnected. It is rather simple: they are connected according to the production rules of the corresponding non-terminal, e.g. a ''Program'' node has always two children: a ''FunctPart'' and an ''ImperPart'' node, as we can see in the production rule:
+<code>
+PROGRAM ::= indent FUNCT_PART IMPER_PART dedent
+</code>
+Here is the relation between non-terminals and AST nodes:
+<code>
+non terminal AST.Program PROGRAM;
+non terminal AST.FunctPart FUNCT_PART;
+non terminal AST.ImperPart IMPER_PART;
+non terminal LinkedList<AST.DoStmt> IO_ACTIONS;
+non terminal AST.IOAction IO_ACTION;
+non terminal AST.Print PRINT;
+non terminal AST.Expr COND, EXPR;
+non terminal AST.ExprList EXPR_LIST, LEXPR;
+non terminal AST.FunctCall FUNCT_CALL;
+non terminal AST.Expr ACTARG;
+non terminal LinkedList<AST.Expr> LACTARG;
+non terminal AST.LFormArg LFORMARG;
+non terminal LinkedList<AST.Stmt> LET_STMTS, GLOBAL_STMTS;
+non terminal AST.DoBlock DO_BLOCK;
+non terminal AST.LetBlockFunc LET_BLOCK_FUNC;
+non terminal AST.LetBlockImper LET_BLOCK_IMPER;
+non terminal AST.IfBlockFunc IF_BLOCK_FUNC;
+non terminal AST.IfBlockImper IF_BLOCK_IMPER;
+non terminal AST.Stmt STMT;
+non terminal AST.Stmt DEF_VALUE;
+non terminal AST.DefFunct DEF_FUNCT;
+non terminal AST.DeclType DECL_TYPE;
+non terminal Type TYPE, TYPE_VALUE, TYPE_FUNC, TYPE_LIST, TYPE_BASIC;
+non terminal AST.Value VALUE, VALUE_BASIC;
+</code>
+=== Semantic rules ===
+After exploring the additional classes for semantic analysis, we can explore the semantic rules, at least the most interesting ones. In particular, we focus extensively on how the symbol table stack is used and much less on building the AST, which is rather straightforward. Error detection and recovery are covered in the following section about error handling, so.
+== The start and the end of parsing ==
+The ''PROGRAM'' non-terminal is the start symbol of the grammar. When parsing begins, we need a symbol table to store all the global variables and functions; instead, when parsing terminates, we need to pop the symbol table, as we do not need it anymore. There is a particular case: empty programs; in this case, no symbol table is pushed or popped: the compiler recognizes the program as good and terminates. Finally, we need to inform the user if parsing is successful or not. \\
+This is a code snippet of the semantic rules related to the ''PROGRAM'' symbol. There are still some lines about code generation, which are explored [[haskell_to_llvm#IR Generation|here]].
+<code java>
+PROGRAM ::= /* empty program */
+	{:
+		if (noCompileErrors) {
+		System.out.println("CODE COMPILED SUCCESSFULLY");
+		}
+		RESULT = new AST.Program(new AST.FunctPart(), new AST.ImperPart(null));
+		[...] // Code generation
+	:}
+	 | indent
+	{:	// push the top-level symtable
+		symTableStack.pushSymTable();
+	:}
+	   FUNCT_PART:funct_part IMPER_PART:imper_part
+	{:
+		symTableStack.popSymTable();
+	:}
+	dedent
+	{:  // pop the top-level symtable
+		if (noCompileErrors) {
+			System.out.println("CODE COMPILED SUCCESSFULLY");
+			RESULT = new AST.Program(funct_part, imper_part);
+			[...] // Code generation
+		}
+		else {
+			System.out.println("CODE NOT COMPILED: FAILED");
+			RESULT = new AST.Program(null, null);
+		}
+	:}
+;
+</code>
+== The Print function ==
+The ''print()'' function is the only action allowed to interact with I/O; it accepts numbers, chars and strings and prints them with a trailing newline. Since the only semantic constraint on this function is the type of the argument, this is also the only check to perform in the semantic rule. Its code is reported below.
+<code java>
+PRINT ::= print ACTARG:actarg
+	{:
+		Type argType = actarg.getType();
+		if (argType.isSubtype("Double") || argType.isSubtype("String") || argType.isSubtype("Char")) {
+			RESULT = new AST.Print(actarg);
+		}
+		else {
+                        [...] // Error handling
+                }
+	:}
+;
+</code>
+== Do blocks ==
+A do block consists of a sequence of IO Actions; this construct defines also its own context, i.e. its own symbol table. Like for all other constructs that create symbol tables, the semantic rules both push and pop the symbol table, so that contexts are well defined.
+Here is the code:
+<code java>
+DO_BLOCK ::=
+		do_begin indent
+		{:
+			symTableStack.pushSymTable();
+		:}
+		IO_ACTIONS:io_actions dedent
+		{:
+			symTableStack.popSymTable();
+			RESULT = new AST.DoBlock(io_actions);
+		:}
+	  | [...] // additional rule for detecting syntax errors
+;
+</code>
+== Declarations and definitions ==
+Declarations and definitions are among the most important constructs: they allow to declare and instantiate variables. There is a difference between the two concepts:
+  * Declarations only declare that a value/function exists, but do not allocate it
+  * Definitions allocate them
+There are three constructs for this purpose:
+  * Type Declaration
+  * Value definition
+  * Function definition
+Type Declarations are responsible for inserting new entries in the current symbol table; given an id and a type, the rule checks if it has already been declared in the current symbol table and that, in case it is a function, it does not return a list as it is not possible in this language. Since the type is on the right of the rule, it is easy to propagate the type to multiple declarations using a synthesized attribute. \\
+Here is the code:
+<code java>
+DECL_TYPE ::= id:id cm DECL_TYPE:decl_type
+	{:
+		Type type = decl_type.getType();
+		if (!symTableStack.isLocallyDeclared(id) && !(type.isSubtype("Function") && type.returnType().isSubtype("List"))) {
+			symTableStack.putEntry(id, symTableStack.new SymTableEntry(type));
+		}
+		else [...] //Error handling
+		RESULT = new AST.DeclType(new Type(type));
+	:}
+	     | id:id clns TYPE:type
+	{:
+		if (!symTableStack.isLocallyDeclared(id) && !(type.isSubtype("Function") && type.returnType().isSubtype("List"))) {
+			symTableStack.putEntry(id, symTableStack.new SymTableEntry(type));
+		}
+		else [...] // Error handling
+		RESULT = new AST.DeclType(type);
+	:}
+;
+</code>
+Value definitions allow defining local and global values. Once we recognize that we are parsing a value definition, an intermediate action defines the value as assigned, allowing recursion on the value, which is possible in Haskell. Before this, the action checks if the value is locally declared and not assigned yet. Finally, when the rule is reduced, the first check verifies if the type of the expression and the type of the value match; then, we check if the value is global or local. In case it is global, the value is treated as a nullary function, as global values are here intended as functions with no arguments; we will see that during IR generation; otherwise, it is treated as a value. \\
+Here is the code:
+<code java>
+DEF_VALUE ::= id:id
+		{:
+			if (symTableStack.isDeclared(id) && symTableStack.isLocallyDeclared(id) && !symTableStack.isAssigned(id)) {
+				symTableStack.getEntry(id).setIsAssigned(true);
+			}
+			else [...] //Error handling
+		:}
+		eq EXPR:expr
+		{:
+			if (symTableStack.isDeclared(id) && symTableStack.isLocallyDeclared(id) && symTableStack.isAssigned(id) && !expr.type.isEquivalent(symTableStack.getEntry(id).getType().getTypeName())) {
+                                        [...] //Error handling
+			}
+			if (symTableStack.isGlobal(id)) {
+				// Global Values are actually Nullary Functions
+				ArrayList<String> argNames = new ArrayList<>();
+				ArrayList<Type> argTypes = new ArrayList<>();
+				AST.LFormArg lformarg = new AST.LFormArg(argNames, argTypes, expr.getType());
+				RESULT = new AST.DefFunct(id, lformarg, expr);
+			}
+			else {
+				RESULT = new AST.DefValue(symTableStack.createUniqueId(id), expr);
+			}
+		:}
+;
+</code>
+Finally, function definitions allow defining functions. Once we recognize that we are parsing a function definition, an intermediate action defines the function as assigned, allowing recursion.
+Before that, the action checks if the function is locally declared and not assigned yet. Functions define their own context, i.e. their own symbol table, so we need to push a symbol table before parsing the rest of the function. Finally, the symbol table is popped and the AST node for the function definition is created.
+<code java>
+DEF_FUNCT ::= id:id
+            {:
+		if (symTableStack.isDeclared(id) && symTableStack.isLocallyDeclared(id) && !symTableStack.isAssigned(id)) {
+			symTableStack.getEntry(id).setIsAssigned(true);
+		}
+		else [...] // Error handling
+		symTableStack.pushSymTable();
+	    :}
+	      LFORMARG:lformarg eq EXPR:expr
+            {:
+		[...] // Error handling
+		symTableStack.popSymTable();
+		RESULT = new AST.DefFunct(symTableStack.createUniqueId(id), lformarg, expr);
+	    :}
+</code>
+To handle formal arguments, represented by the ''LFORMARG'' symbol, we need to propagate the type of the arguments by using the function type tree, to assign it and define the arguments in the symbol table. In the meanwhile, the parser checks that the number of arguments is right and that there are no multiple arguments with the same name. \\
+Here is the code:
+<code java>
+LFORMARG ::= LFORMARG:lformarg id:id
+	    {:
+			Type propType = lformarg.getPropType();
+			if (!symTableStack.isLocallyDeclared(id) && propType != null && propType.isSubtype("Function")) {
+				Type argTree = new Type(propType.getTypeParam(0));
+				symTableStack.putEntry(id, symTableStack.new SymTableEntry(argTree, true));
+				lformarg.getArgNames().add(symTableStack.createUniqueId(id));
+				lformarg.getArgTypes().add(argTree);
+				RESULT = new AST.LFormArg(lformarg.getArgNames(), lformarg.getArgTypes(), propType.getTypeParam(1));
+			}
+			else [...] // Error handling
+             :}
+	 | id:id
+	    {:
+		String funcName = (String) parser.stack(-2);
+		if (symTableStack.isDeclared(funcName)) {
+			Type type = symTableStack.getEntry(funcName).getType();
+			if (!symTableStack.isLocallyDeclared(id) && type != null && type.isSubtype("Function")) {
+				Type argTree = new Type(type.getTypeParam(0));
+				symTableStack.putEntry(id, symTableStack.new SymTableEntry(argTree, true));
+				ArrayList<String> argNames = new ArrayList<>();
+				ArrayList<Type> argTypes = new ArrayList<>();
+				argNames.add(symTableStack.createUniqueId(id));
+				argTypes.add(argTree);
+				RESULT = new AST.LFormArg(argNames, argTypes, type.getTypeParam(1));
+			}
+		else [...] //Error handling
+		}
+		else [...] //Error handling
+	:}
+;
+</code>
+== Expressions and conditions ==
+Expressions are fundamental for performing computation, as they represent operations which return some value. In this compiler, we define two types of expressions: operations and special constructs. Operations are basic expressions like addition, subtraction and so on, while special constructs represent expressions that need special management, like if-then-else blocks and let bindings.
+In particular, special constructs are:
+  * let bindings;
+  * if-then-else blocks;
+  * function calls;
+  * constants;
+  * list of expressions
+For all these expressions, we use special rules to handle them. \\
+Instead, basic expressions are all handled in the same manner: we check the type of the operands and then produce the type of the result. The type is the only information that we can use for semantic analysis, in this case.
+Conditions are special expressions, which must be of boolean type.
+Example of operation: addition.
+<code java>
+EXPR ::= EXPR:expr1 plus EXPR:expr2
+	{:
+		String opType = "+";
+		AST.BasicExpr[] subExpressions = { expr1, expr2 };
+		if (expr1.type.isSubtype("Double") && expr2.type.isSubtype("Double") && expr1.type.isEquivalent(expr2.type.getTypeName())) {
+			RESULT = new AST.Expr(new Type(expr1.type), AST.ExprKind.PLUS, subExpressions);
+		}
+		else [...] //Error handling
+	:}
+</code>
+== Function calls ==
+In this compiler, function calls refer both to functions and to values as Haskell does.
+The semantic rule proceeds as follows: first, we check that the function/value is declared and assigned; then, we look at the number of actual arguments and distinguish the two cases: if there are no arguments, we refer to value, otherwise to a function. If we refer to a value, we check that the id is not a function and finally build the AST node. In the other case, we check that the type of the actual arguments and of the formal arguments matches, then we build the AST node.
+<code java>
+FUNCT_CALL ::= id:id LACTARG:lactarg
+	   {:
+		int nArgs = lactarg.size();
+		Type funcType = null;
+		boolean failedArgsTypeChecking = false;
+		if(symTableStack.isDeclared(id) && symTableStack.isAssigned(id)) {
+			funcType = symTableStack.getEntry(id).getType();
+			// This is a Value Call
+			if (nArgs == 0) {
+				if (funcType != null && !funcType.isSubtype("Function")) { // This is not a Function Call, but a reference to a Value
+					RESULT = new AST.FunctCall(new Type(funcType), symTableStack.createUniqueId(id), lactarg);
+				}
+			else [...] // Error handling
+			}
+			// This is a Function Call
+			else if (nArgs > 0) {
+				if (funcType != null && funcType.isSubtype("Function") && funcType.hasArity(nArgs)) {
+					for (AST.Expr actarg : lactarg) {
+						if (!actarg.type.isEquivalent(funcType.getTypeParam(0)))
+							failedArgsTypeChecking = true;
+						funcType = funcType.getTypeParam(1); //extract the right child of the funcType Tree
+					}
+					if (!failedArgsTypeChecking) {
+						RESULT = new AST.FunctCall(new Type(funcType), symTableStack.createUniqueId(id), lactarg);
+					}
+					else [...] // Error handling
+					}
+				else [...] //Error handling
+				}
+			}
+			else [...] //Error handling
+			}
+	   :}
+;
+</code>
+An important point: the ''symTableStack.getEntry()'' method extracts the symbol table entry of the value/function and, if it is not in the top-level symbol table, it searches in the symbol table below. This allows implementing lexical scoping, i.e. every value/function is available to all the nested blocks with respect to the one where it has been declared.
+This means that the following Haskell code works:
+<code haskell>
+value :: Int
+value = let x :: Int; x = 3
+        in
+          let y :: Int; y = 4
+          in
+            x + y
+</code>
+== Let bindings (functional part) ==
+Let bindings exist for one purpose: declaring and defining new variables in a nested context. For this reason, the semantic rules only create and destroy the symbol table where to store the new statements.
+<code java>
+LET_BLOCK_FUNC ::= let indent
+			{:
+				symTableStack.pushSymTable();
+			:}
+		   LET_STMTS:let_stmts dedent in EXPR:expr
+                        {:
+				symTableStack.popSymTable();
+				RESULT = new AST.LetBlockFunc(new Type(expr.type), let_stmts, expr);
+			:}
+			| [...] //Additional rule for syntax errors
+;
+</code>
+== The if-then-else block (functional part) ==
+Like for let bindings, if-then-else blocks are rather simple to analyze: the parser needs only to check that the type of the value returned by the two branches is the same, otherwise it is an error.
+<code>
+IF_BLOCK_FUNC ::= if_begin COND:cond then EXPR:thenBody else_begin EXPR:elseBody
+		{:
+			if(thenBody.type.isEquivalent(elseBody.type)) {
+				RESULT = new AST.IfBlockFunc(thenBody.type, cond, thenBody, elseBody);
+			}
+			else [...] // Error handling
+		:}
+                 | [...] //Additional rules for syntax errors
+;
+</code>
+== Building types ==
+How are type trees built? By mounting the subtrees according to the production rules, as the code clearly shows.
+<code java>
+TYPE ::= TYPE_VALUE:type_value {: RESULT = type_value; :}
+       | TYPE_FUNC:type_func {: RESULT = type_func; :}
+;
+// both basic types and compound
+TYPE_VALUE ::= TYPE_LIST:type_list {: RESULT = type_list; :}
+	     | TYPE_BASIC:type_basic {: RESULT = type_basic; :}
+;
+TYPE_BASIC ::= type_int {: RESULT = Type.getType("Int"); :}
+	     | type_double {: RESULT = Type.getType("Double"); :}
+	     | type_bool {: RESULT = Type.getType("Bool"); :}
+	     | type_char {: RESULT = Type.getType("Char"); :}
+;
+TYPE_LIST ::= bo TYPE_BASIC:type_basic bc
+		{:
+		   Type type = Type.getType("List");
+		   type.setTypeParam(0, type_basic);
+		   RESULT = type;
+		:}
+	    | type_string {: RESULT = Type.getType("String"); :}
+;
+// no higher order functions
+TYPE_FUNC ::= TYPE_VALUE:type_value arrow TYPE_FUNC:type_func
+		{:
+			Type type = Type.getType("Function");
+			type.setTypeParam(0, type_value);
+			type.setTypeParam(1, type_func);
+			RESULT = type;
+		:}
+	    | TYPE_VALUE:type_valueL arrow TYPE_VALUE:type_valueR
+		{:
+			Type type = Type.getType("Function");
+			type.setTypeParam(0, type_valueL);
+			type.setTypeParam(1, type_valueR);
+			RESULT = type;
+		:}
+;
+</code>
+=== Error handling ===
+As for lexicon and syntax, the semantics of a program can be wrong: it is necessary to handle these errors, too.
+Errors are recognized inside the semantic rules, so error detection and recovery are defined inside the Parser.
+Every time an error is detected, the parser calls the ''report_semantic_error()'' method, reported below:
+<code java>
+public void report_semantic_error(String msg) {
+	noCompileErrors = false;
+	System.err.print("ERROR: Semantic error");
+        System.err.println(" (line "+currLine+", column "+currColumn+"): " + msg);
+}
+</code>
+Then, the parser creates a valid AST node for the production rule and so it recovers the error and can move forward.
+Example: semantic error in a functional if-then-else block:
+<code java>
+IF_BLOCK_FUNC ::= if_begin COND:cond then EXPR:thenBody else_begin EXPR:elseBody
+		{:
+			if(thenBody.type.isEquivalent(elseBody.type)) {
+				RESULT = new AST.IfBlockFunc(thenBody.type, cond, thenBody, elseBody);
+			}
+			else {
+				report_semantic_error("Multiple Return Types for If-Then-Else");
+				RESULT = new AST.IfBlockFunc(Type.getType("error"), cond, thenBody, elseBody);
+			}
+		:}
+                 | [...] //Additional rules for syntax errors
+</code>
+This is the list of all the possible syntax errors that LHC can recognize:
+  * Print does not support the argument type
+  * No support for local functions
+  * Global lists not supported
+  * Lists as a return value are not supported
+  * Missing value declaration
+  * Missing function declaration
+  * Multiple value declaration
+  * Value is not locally declared
+  * Function is not locally declared
+  * Multiple value assignment
+  * Multiple function assignment
+  * Mismatching type on assignment
+  * Not a function (but you expect one)
+  * Not a value (but you expect one)
+  * Mismatching arity in function (wrong number of arguments)
+  * Return type is not equivalent to the expression type (in Function Definition)
+  * Multiple argument declaration
+  * Wrong type in condition (Expected type: Bool)
+  * Wrong type in LHS/RHS of [expression_kind] expression (expression_kind represents "addition", "subtraction" and so on, typically by using the operator symbol)
+  * Mismatching operand type on [expression_kind] Expression
+  * Wrong type in arguments applied to function
+  * Value declared but not assigned (when using a not initialized variable)
+  * Multiple return types for if-then-else
+  * Wrong type in list expression (subexpressions in a list expression must have the same type)
+==== IR Generation ====
+After lexical, syntactical and semantic analysis, we can assume that the program is correct and recognized so we can proceed with the next step: generating the intermediate representation.
+For this purpose, we need additional fields and methods, especially in the AST.
+We said that the ''BasicNode'' class defines fields and methods for code generation. They are:
+  * ''code'': the LLVM code for the node;
+  * ''basicBlock'': reports in which basic block we are; used for if-then-else code generation;
+  * ''codegen()'': the method where code is generated; the method returns the identifier of the return value, if the block refers to an expression
+  * ''mountCode()'': merges the code of two AST nodes
+The ''LLVM'' class supports the ''codegen()'' method by providing static methods that build any LLVM instruction required, like ''createAlloca()''.
+The ''LLVM'' class provides also two counters, ''counterSSA'' and ''counterIf'', to handle static single assignments and label uniqueness for the different if-then-else blocks.
+In this section, we focus on the same constructs seen in the previous section with respect to IR generation.
+== The start and the end of parsing ==
+The ''Program'' node has a very simple ''codegen()'' method for building the output program:
+<code java>
+public static class Program extends BasicNode {
+	private FunctPart functPart;
+	private ImperPart imperPart;
+	Program(FunctPart functPart, ImperPart imperPart) {
+		this.functPart = functPart;
+		this.imperPart = imperPart;
+	}
+	@Override
+	String codegen() {
+		functPart.codegen();
+		mountCode(functPart);
+		imperPart.codegen();
+		mountCode(imperPart);
+		return null;
+	}
+}
+</code>
+== The Print function ==
+The ''print()'' function, like any other function call, requires its argument to be evaluated. After that, we can call the function.
+<code java>
+public static class Print extends IOAction {
+	private Expr actarg;
+	Print(Expr actarg) {
+		this.actarg = actarg;
+	}
+	@Override
+	String codegen() {
+		String actargIndex = actarg.codegen();
+		mountCode(actarg);
+		String result = LLVM.createVariable(LLVM.getCounterSSA());
+		code.add(LLVM.createPrintCall(result, actarg.getType(), actargIndex));
+		return null;
+	}
+}
+</code>
+== Do blocks ==
+Do blocks are lists of statements, so it is sufficient to generate the code of all the statements composing the block.
+<code java>
+public static class DoBlock extends IOAction {
+	private LinkedList<DoStmt> ioActions;
+	DoBlock(LinkedList<DoStmt> ioActions) {
+		this.ioActions = ioActions;
+	}
+	@Override
+	String codegen() {
+		for (DoStmt ioAction: ioActions) {
+			ioAction.codegen();
+			mountCode(ioAction);
+		}
+		return null;
+	}
+}
+</code>
+== Declarations and definitions ==
+Type declarations do not generate code: since in Haskell we cannot use a value/function until it is assigned, it will be allocated and assigned all at once.
+Value definitions are rather simple: first, we evaluate the expression to assign; then we allocate the value and store the expression in it. Values here are intended as __local__: global values are treated as functions and use a ''DefFunct'' node.
+<code java>
+public static class DefValue extends Stmt {
+	private String id; // must be a "unique" id
+	private Expr expr;
+	DefValue(String id, Expr expr) {
+		this.id = id;
+		this.expr = expr;
+	}
+	@Override
+	String codegen() {
+		String exprIndex = expr.codegen();
+		mountCode(expr);
+		code.add(LLVM.createAlloca(LLVM.createRegister(id), expr.getType()));
+		code.add(LLVM.createStore(LLVM.createRegister(id), exprIndex, expr.getType()));
+		return null;
+	}
+}
+</code>
+Function definitions define global functions. To generate a function we need to:
+  * create its definition (return type, name and arguments);
+  * open the function; the first block of the function is always called ''entry'', so we call ''setBasicBlock'' to update the current basic block;
+  * store all the values of the arguments and give them their name as in the native Haskell code;
+  * generate the code of the body
+  * create the return statement
+  * close the function
+All these steps can be recognized in the code:
+<code java>
+public static class DefFunct extends Stmt {
+	private String id; // must be a "unique" id
+	private LFormArg lformarg;
+	private Expr expr;
+	DefFunct(String id, LFormArg lformarg, Expr expr) {
+		this.id = id;
+		this.lformarg = lformarg;
+		this.expr = expr;
+	}
+	@Override
+	String codegen() {
+		ArrayList<String> argNames = lformarg.getArgNames();
+		ArrayList<Type> argTypes = lformarg.getArgTypes();
+		LLVM.resetCounterSSA();
+		code.add(LLVM.createFunctionDefinition(id, expr.getType(), lformarg));
+		code.addAll(LLVM.openFunction());
+		setBasicBlock("entry");
+		for (int i = 0; i < argNames.size(); i++) {
+			code.add(LLVM.createAlloca(LLVM.createRegister(argNames.get(i)), argTypes.get(i)));
+			code.add(LLVM.createStore(LLVM.createRegister(argNames.get(i)), LLVM.createRegister(i), argTypes.get(i)));
+			LLVM.getCounterSSA(); //update the register index
+		}
+		String result = expr.codegen();
+		mountCode(expr);
+		code.add(LLVM.createReturn(result, expr.getType()));
+		code.add(LLVM.closeFunction());
+		return null;
+	}
+}
+</code>
+== Expressions and conditions ==
+To evaluate an expression, it is necessary to evaluate the subexpressions first, which are stored in an array. Then, we can build the code for the expression itself: first, we create a new register for the result; then, we create the operation instruction and finally use the register of the return value as the return value of the ''codegen()'' method itself; in this way, the register is available to the rest of the program, which can use it as operand for other instructions.
+The same distinction between operations and special constructs is still true during code generation. While operations build code using LLVM instructions for arithmetic and more complex operations, special constructs are managed on their own.\\
+Conditions are boolean expressions, so they are treated as expressions.
+<code java>
+public enum ExprKind {
+	PLUS, MINUS, TIMES, DIV, INTDIV, REM, AND, OR,
+	RELNOTEQ, RELEQ, RELGE, RELGT, RELLE, RELLT,
+	INDEX, NOT, UMINUS, LET_BLOCK_FUNC,
+	IF_BLOCK_FUNC, FUNCT_CALL, VALUE, EXPR_LIST
+}
+public static class Expr extends BasicExpr {
+	private ExprKind exprKind;
+	private BasicExpr[] subExpressions; //the subexpressions
+	Expr (Type type, ExprKind exprKind, BasicExpr[] subExpressions) {
+		this.type = type;
+		this.exprKind = exprKind;
+		this.subExpressions = subExpressions;
+	}
+	@Override
+	String codegen() {
+		String[] subExpResults = new String[subExpressions.length];
+		String result = null;
+		for (int i = 0; i < subExpressions.length; i++) {
+			subExpResults[i] = subExpressions[i].codegen();
+			mountCode(subExpressions[i]);
+		}
+		switch (exprKind) {
+			case PLUS:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createAdd(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case MINUS:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createSub(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case TIMES:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createMult(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case DIV:
+				String[] oldExpResults = new String[subExpressions.length];
+				for (int i = 0; i < subExpressions.length; i++) {
+					oldExpResults[i] = subExpResults[i];
+					if (subExpressions[i].getType().isEquivalent("Int")) {
+						subExpResults[i] = LLVM.createVariable(LLVM.getCounterSSA());
+						code.add(LLVM.createSItoFP(subExpResults[i], oldExpResults[i], type, subExpressions[i].getType()));
+					}
+				}
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createDiv(result, subExpResults, type));
+				break;
+			case INTDIV:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createIntDiv(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case REM:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRem(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case AND:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createAnd(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case OR:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createOr(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case RELNOTEQ:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRelNotEQ(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case RELEQ:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRelEQ(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case RELGE:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRelGE(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case RELGT:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRelGT(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case RELLE:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRelLE(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case RELLT:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createRelLT(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case INDEX:
+				String elemPtr = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createGEP(elemPtr, subExpResults[0], subExpressions[0].getType(), subExpResults[1]));
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createLoad(result, type, elemPtr));
+				break;
+			case NOT:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+				code.add(LLVM.createNot(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case UMINUS:
+				result = LLVM.createVariable(LLVM.getCounterSSA());
+			code.add(LLVM.createUMinus(result, subExpResults, subExpressions[0].getType()));
+				break;
+			case LET_BLOCK_FUNC:
+				result = subExpResults[0];
+				break;
+			case IF_BLOCK_FUNC:
+				result = subExpResults[0];
+				break;
+			case FUNCT_CALL:
+				result = subExpResults[0];
+				break;
+			case VALUE:
+				result = subExpResults[0];
+				break;
+			case EXPR_LIST:
+				result = subExpResults[0];
+				break;
+		}
+		return result;
+	}
+}
+</code>
+== Function calls ==
+To perform a function call, we need first to evaluate all its arguments. If it is a value, we create a register where to load its content and finally return this register. Otherwise, we create a register where to save the result, create a function call by using the function name and all its arguments one after the other; finally, we return the result register.
+<code java>
+public static class FunctCall extends BasicExpr {
+	private String id; // must be a unique id
+	private LinkedList<Expr> actargs;
+	FunctCall(Type type, String id, LinkedList<Expr> actargs) {
+		this.type = type;
+		this.id = id;
+		this.actargs = actargs;
+	}
+	@Override
+	String codegen() {
+		ArrayList<String> argIds = new ArrayList<>(); // list containing all the input argument id for the function
+		ArrayList<Type> argTypes = new ArrayList<>();
+		String result;
+		for (Expr actarg: actargs) {
+			argIds.add(actarg.codegen());
+			argTypes.add(actarg.type);
+			mountCode(actarg);
+		}
+		if (id.contains("$")) { // only local values have a '$' appended to their name
+			result = LLVM.createRegister(LLVM.getCounterSSA());
+			code.add(LLVM.createLoad(result, this.type, LLVM.createRegister(id)));
+		}
+		else {
+			result = LLVM.createRegister(LLVM.getCounterSSA());
+			code.add(LLVM.createFunctionCall(result, this.type, id, argIds, argTypes));
+		}
+		return result;
+	}
+}
+</code>
+__About lexical scoping__: if we have two values with the same name, how can we refer to the right one? We may distinguish them by observing in which symbol table the value is defined, but symbol tables are destroyed when parsing terminates. So, we append to the name of a value a symbol table identifier reporting this kind of information. \\
+In particular, ''SymTableStack'' defines additional fields and methods for this purpose:
+  * '' symTableStackId'': an Arraylist, contains the numeric id for all the levels of the stack and is expressively used for intermediate code generation; in brief, every time a new symbol table is pushed onto the stack at a given level, the id corresponding to its level increments; the id of a given symbol table corresponds to the concatenation of all the ids, from the bottom level to that of the symbol table we are considering, separated by "$" (e.g. let us suppose that we have two symbol tables, one on top of the other; let us suppose also that the id of the two symbol tables is "1" for the bottom one and "1$1" for the top one. If we pop a symbol table, only the bottom one remains with id "1". Then, we suppose to push another symbol table: its id will be "1$2"). We can safely use "$" as a separator because it is not a valid character for Haskell identifiers, so there is no risk of homonymy.
+  * ''updateSymTableStackId()'': update ''symTableStackId'' when a symbol table is pushed
+  * ''getSymTableIds()'': gets a symbol table id, following the specification defined above
+  * ''createUniqueId()'': create an id for a variable (for IR Generation)
+When we reduce a function call, such a unique id is stored in the AST node of the expression, thus storing the information and preserving name uniqueness. In this way, the unique identifier can be used during code generation.
+== Let bindings (functional part) ==
+A let binding consists of a sequence of statements, together with a final expression. So, to generate its code we need to generate all the statements and then evaluate the expression.
+<code java>
+public static class LetBlockFunc extends BasicExpr {
+	private LinkedList<Stmt> letStmts;
+	private Expr expr;
+	LetBlockFunc(Type type, LinkedList<Stmt> letStmts, Expr expr) {
+		this.type = type;
+		this.letStmts = letStmts;
+		this.expr = expr;
+	}
+	@Override
+	String codegen() {
+		for (Stmt letStmt: letStmts) {
+			letStmt.codegen();
+			mountCode(letStmt);
+		}
+		String result = expr.codegen();
+		mountCode(expr);
+		return result;
+	}
+}
+</code>
+== If-then-else block (functional part) ==
+The if-then-else block is one of the most peculiar in terms of IR generation because it exploits the basic block concept, which is used in LLVM to define the control flow graph. \\
+Before generating code, we define an index for this if-then-else block: every if-then-else has a different index so that it is impossible to have homonymy conflicts on the labels.
+To generate an if-then-else block, we need to evaluate its condition first. Then, we create a conditional branch, depending on the condition result. After this, the body of the then branch can be generated. Since we define a new label, we define a new basic block; for this reason, we call ''setBasicBlock()'' to update this information. After generating the body of the then branch, it is very important to catch the current basic block: nested if-then-else blocks may have changed it.
+Then, we create an unconditional branch towards the end of the block, to skip the else expression. At this point, we can generate the else expression and branch to the exit of the loop: this latter step is mandatory because in LLVM every basic block requires either a branch or a return statement. \\
+Finally, we create a register for the result and define a phi node: the phi node is a sort of merge operator which takes the results of the two branches and selects which one to consider depending on which branch has been executed. For phi nodes to work correctly, it is mandatory to define the exact basic block where the result of the branch is defined, which explains why it is necessary to call ''getBasicBlock()'' in both the branches. In the end, the register containing the result is returned, so that other parts of the program can use it.
+<code java>
+public static class IfBlockFunc extends BasicExpr {
+	private Expr cond, thenBody, elseBody;
+	IfBlockFunc(Type type, Expr cond, Expr thenBody, Expr elseBody) {
+		this.type = type;
+		this.cond = cond;
+		this.thenBody = thenBody;
+		this.elseBody = elseBody;
+	}
+	@Override
+	String codegen() {
+		final int ifIndex;
+		String condIndex, thenIndex, elseIndex, thenLabel, elseLabel, exitLabel, thenBB, elseBB, result;
+		ifIndex = LLVM.getCounterIf();
+		condIndex = cond.codegen();
+		mountCode(cond);
+		code.add(LLVM.createBranchCond(condIndex, "if.then$" + ifIndex, "if.else$" + ifIndex));
+		thenLabel = "if.then$" + ifIndex;
+		code.add(LLVM.createLabel("if.then$" + ifIndex));
+		setBasicBlock(thenLabel);
+		thenIndex = thenBody.codegen();
+		mountCode(thenBody);
+		thenBB = getBasicBlock();
+		code.add(LLVM.createBranchNotCond("if.exit$" + ifIndex));
+		elseLabel = "if.else$" + ifIndex;
+		code.add(LLVM.createLabel("if.else$" + ifIndex));
+		setBasicBlock(elseLabel);
+		elseIndex = elseBody.codegen();
+		mountCode(elseBody);
+		elseBB = getBasicBlock();
+		code.add(LLVM.createBranchNotCond("if.exit$" + ifIndex));
+		exitLabel = "if.exit$" + ifIndex;
+		code.add(LLVM.createLabel("if.exit$" + ifIndex));
+		setBasicBlock(exitLabel);
+		result = LLVM.createVariable(LLVM.getCounterSSA());
+		code.add(LLVM.createPHINode(result, this.type, thenBB, thenIndex, elseBB, elseIndex));
+		return result;
+	}
+}
+</code>
+To conclude, the parser needs additional fields and methods, too; in particular, it defines a ''code'' field to store it, plus a setter for that field. It is up to the parser to start the IR generation since it knows when parsing has finished: we need the full AST before generating the code. But how and when does code generation start? At the last reduction. In particular, the parser calls the ''codegen()'' method of the root node of the AST when the production rule for the start symbol is reduced.
+<code java>
+PROGRAM ::= /* empty program */
+	{:
+		if (noCompileErrors) {
+		System.out.println("CODE COMPILED SUCCESSFULLY");
+		}
+		RESULT = new AST.Program(new AST.FunctPart(), new AST.ImperPart(null));
+		// Code generation
+		RESULT.codegen();
+		// Retrieve the code
+		setCode(RESULT.getCode());
+	:}
+	  | indent
+	{:	// push the top-level symtable
+		symTableStack.pushSymTable();
+	:}
+	    FUNCT_PART:funct_part IMPER_PART:imper_part
+	{:
+		symTableStack.popSymTable();
+	:}
+	dedent
+	{:  // pop the top-level symtable
+		if (noCompileErrors) {
+			System.out.println("CODE COMPILED SUCCESSFULLY");
+			RESULT = new AST.Program(funct_part, imper_part);
+			// Code generation
+			RESULT.codegen();
+			// Retrieve the code
+			setCode(RESULT.getCode());
+		}
+		else {
+			System.out.println("CODE NOT COMPILED: FAILED");
+			RESULT = new AST.Program(null, null);
+		}
+	:}
+;
+</code>
+===== Installation and usage =====
+LHC is compatible with:
+  * Ubuntu 20.04 LTS (but it should work with less recent versions or other Linux distributions, too).
+=== Prerequisites ===
+  * JDK (Java Development Kit) - Version 16 or above (successfully tested with JDK 16, but it should work with less recent releases, too)
+  * [[https://www.jflex.de/|JFlex]]
+  * [[install_linux_bash#CUP|Cup]] - This is __not__ the original implementation of Cup, but an enhanced one that displays a graphical representation of the Parse Tree
+  * LLVM (use  ''sudo apt install llvm'' to install the package) - Version 10 or above (less recent versions //may// work, too)
+=== Download and Commands ===
+Download link: [[https://www.skenz.it/repository/compilers/ass/haskell_to_LLVM/LHC.zip|LHC]] \\
+After downloading and accessing the main folder, you can use the following commands to play with the compiler:
+  * **make install**: Install everything
+  * **make uninstall**: uninstall everything
+  * **make %.ll**: build the corresponding %.hs file and produce the resulting LLVM code
+  * **lli %.ll**: run the output code
+In the //**./programs**// folder, you can already find some examples of buildable and working Haskell programs.
+== Example ==
+Let us suppose that we have just downloaded and unzipped the compiler and that we want to compile the GCD program stored in //**./programs/gcd.hs**//. \\
+First, go to the main folder (LHC).  \\
+Then, run the following commands to install the compiler, build the code and run it:
+  - to install everything: //make install//
+  - to build the code: //make ./programs/gcd.ll// (notice that the target is an LLVM file)
+  - to run it: //lli ./programs/gcd.ll// (the output code is in the same folder as the input one)
+  - finally, to uninstall everything: //make uninstall//
+=== Other links ===
+Github link to the original repo: https://github.com/palmleon/LHC

If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it

You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/haskell_to_llvm?do=diff&rev2%5B0%5D=1642505505&rev2%5B1%5D=1642520182&difftype=sidebyside