compilers:ply
Return to Home page
If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it
You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/ply?do=diff
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | compilers:ply [2024/04/08 22:34] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Ply ====== | ||
+ | PLY is a parsing tool written purely in Python. It is a re-implementation of Lex and Yacc | ||
+ | originally in C-language. PLY uses the same LALR parsing technique as Lex and Yacc. | ||
+ | It includes support for empty productions, | ||
+ | grammars. | ||
+ | |||
+ | PLY has two main modules which are part of the ply package; these modules are used to extend | ||
+ | the classes used to develop the Laboratories (myLexer.py and myParser.py) as we’ll see | ||
+ | later; the modules used are: | ||
+ | |||
+ | ply.lex - A re-implementation of Lex for lexical analysis | ||
+ | |||
+ | ply.yacc - A re-implementation of Yacc for parser creation | ||
+ | |||
+ | The two tools are meant to work together. Specifically, | ||
+ | interface in the form of a token() function that returns the next valid token on the | ||
+ | input stream. '' | ||
+ | Finally the PLY parser will generate an LALR(1) parsing table. | ||
+ | |||
+ | In summary, PLY consists of two separate modules: '' | ||
+ | |||
+ | In the rest of this page you can find a description of the main functionalities of '' | ||
+ | |||
+ | **MANY EXAMPLES ABOUT PLY:** [[https:// | ||
+ | |||
+ | ===== Execution ===== | ||
+ | |||
+ | To execute all the examples developed you have to run from the command line the | ||
+ | following command: | ||
+ | |||
+ | python exercise_name.py myFile.txt | ||
+ | |||
+ | Remember to pass ALWAYS the '' | ||
+ | passed to the lexer. | ||
+ | |||
+ | When some output files have to be generated, these are created inside the | ||
+ | current directory of the exercise itself. | ||
+ | |||
+ | ====== Lexer ====== | ||
+ | |||
+ | === Solution 1 === | ||
+ | |||
+ | For simple projects or problems, it is possible to define the lexer (and its components) inside the main function itself. | ||
+ | |||
+ | <file python Laboratories/ | ||
+ | import ply.lex as lex | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | |||
+ | # list of TOKENS | ||
+ | |||
+ | tokens = [ | ||
+ | |||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | |||
+ | ] | ||
+ | |||
+ | # tokens DEFINITION | ||
+ | |||
+ | t_nl = r' | ||
+ | |||
+ | t_letter = r' | ||
+ | t_digit | ||
+ | t_id = r' | ||
+ | |||
+ | t_path = r' | ||
+ | |||
+ | t_ignore = r' | ||
+ | |||
+ | def t_error(t): | ||
+ | print(" | ||
+ | t.lexer.skip(1) | ||
+ | |||
+ | # reading INPUT FILE | ||
+ | |||
+ | myFile = open(sys.argv[1]) | ||
+ | |||
+ | lexer = lex.lex() | ||
+ | |||
+ | with myFile as fp: | ||
+ | for line in fp: | ||
+ | try: | ||
+ | lexer.input(line) | ||
+ | |||
+ | for token in lexer: | ||
+ | if token.type == ' | ||
+ | print(" | ||
+ | |||
+ | except EOFError: | ||
+ | break | ||
+ | </ | ||
+ | |||
+ | **Text File to execute the code above:** [[https:// | ||
+ | |||
+ | For execution: | ||
+ | <code bash> | ||
+ | python ex1.py myFile.txt | ||
+ | </ | ||
+ | |||
+ | As you can see the code above is kind of simple and it doesn' | ||
+ | |||
+ | === Solution 2 === | ||
+ | |||
+ | If you have to develop more complex solutions, putting the lexer (and also the parser) | ||
+ | inside the main function isn't a smart solution. | ||
+ | |||
+ | The module option can be used to define lexers from instances of a class. | ||
+ | For example: | ||
+ | |||
+ | <file python Practices/ | ||
+ | import ply.lex as lex | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | |||
+ | from myLexer import * | ||
+ | |||
+ | # create objects MY LEXER and MY PARSER | ||
+ | myLex = MyLexer() | ||
+ | |||
+ | lexer = myLex.lexer | ||
+ | |||
+ | # reading INPUT FILE | ||
+ | |||
+ | myFile = open(sys.argv[1]) | ||
+ | |||
+ | with myFile as fp: | ||
+ | for line in fp: | ||
+ | try: | ||
+ | lexer.input(line) | ||
+ | |||
+ | for token in lexer: | ||
+ | pass | ||
+ | |||
+ | except EOFError: | ||
+ | break | ||
+ | </ | ||
+ | |||
+ | <file python Practices/ | ||
+ | import ply.lex as lex | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | from ply.lex import TOKEN | ||
+ | |||
+ | class MyLexer(): | ||
+ | |||
+ | # CONSTRUCTOR | ||
+ | def __init__(self): | ||
+ | print(' | ||
+ | self.lexer = lex.lex(module=self) | ||
+ | |||
+ | # DESTRUCTOR | ||
+ | def __del__(self): | ||
+ | print(' | ||
+ | |||
+ | # list of TOKENS | ||
+ | tokens = [ | ||
+ | | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | |||
+ | ] | ||
+ | |||
+ | # tokens DEFINITION | ||
+ | |||
+ | def t_NUMBER(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_PLUS(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_MINUS(self, | ||
+ | r' | ||
+ | print(" | ||
+ | | ||
+ | def t_STAR(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_DIV(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_OPEN_BRACKET(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_CLOSE_BRACKET(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_EQUAL(self, | ||
+ | r' | ||
+ | print(" | ||
+ | |||
+ | def t_nl(self, | ||
+ | r' | ||
+ | |||
+ | # every symbol that doesn' | ||
+ | def t_error(self, | ||
+ | r' | ||
+ | print(" | ||
+ | t.lexer.skip(1) | ||
+ | </ | ||
+ | |||
+ | **Text File to execute the code above:** [[https:// | ||
+ | |||
+ | For execution: | ||
+ | <code bash> | ||
+ | python example.py myFile.txt | ||
+ | </ | ||
+ | |||
+ | As you can see the exercise was divided in two different files: the example.py which only istantiate the lexer object and pass to it the input file; the '' | ||
+ | |||
+ | In the first solution, to instantiate the lexer, you can simply use the following instruction | ||
+ | |||
+ | <code python> | ||
+ | lexer = lex.lex() | ||
+ | </ | ||
+ | |||
+ | In the second solution the initialization of the lexer happens inside the class constructor | ||
+ | |||
+ | <code python> | ||
+ | def __init__(self): | ||
+ | print(' | ||
+ | self.lexer = lex.lex(module=self) | ||
+ | </ | ||
+ | |||
+ | In most laboratories the '' | ||
+ | anyway, with more complex problems, it is a good practice to create a Lexer (and, if necessary, a Parser) class in order to extend the modules given by the PLY library because, as we'll see later during the Laboratories, | ||
+ | |||
+ | Inside the lexer class you can find: | ||
+ | |||
+ | == Token Array == | ||
+ | |||
+ | <code python> | ||
+ | # It contains all the tokens defined inside the class; it is especially important to let | ||
+ | # the parser recognize all the tokens used in the grammar rules. | ||
+ | # Tokens are usually given names to indicate what they are. For example: | ||
+ | |||
+ | ' | ||
+ | ' | ||
+ | </ | ||
+ | |||
+ | == List of Tokens == | ||
+ | <code python> | ||
+ | # The identification of tokens is typically done by writing a series of regular expression rules. | ||
+ | # The token could be defined in two different ways: | ||
+ | |||
+ | # 1. Regular expression rules for simple tokens | ||
+ | |||
+ | t_PLUS | ||
+ | t_MINUS | ||
+ | t_TIMES | ||
+ | t_DIVIDE | ||
+ | t_LPAREN | ||
+ | t_RPAREN | ||
+ | | ||
+ | # 2. A regular expression rule with some action code | ||
+ | | ||
+ | def t_NUMBER(t): | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | == List of states == | ||
+ | <code python> | ||
+ | a list of the states that can be matched by the lexer | ||
+ | | ||
+ | # list of STATES -> used only the one to catch comments | ||
+ | states = ( | ||
+ | (' | ||
+ | ) | ||
+ | </ | ||
+ | |||
+ | == Rule to track line numbers == | ||
+ | <code python> | ||
+ | def t_newline(t): | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | == Ignored Characters == | ||
+ | <code python> | ||
+ | t_ignore | ||
+ | </ | ||
+ | |||
+ | == Error handling Rule == | ||
+ | <code python> | ||
+ | def t_error(t): | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | To use the lexer, you first need to give to him input text using its input() method. | ||
+ | After that, repeated calls to token () produce tokens; for example: | ||
+ | |||
+ | <code python> | ||
+ | # reading INPUT FILE | ||
+ | |||
+ | ... | ||
+ | |||
+ | myFile = open(sys.argv[1]) | ||
+ | |||
+ | with myFile as fp: | ||
+ | for line in fp: | ||
+ | try: | ||
+ | lexer.input(line) | ||
+ | |||
+ | for token in lexer: | ||
+ | pass | ||
+ | |||
+ | except EOFError: | ||
+ | break | ||
+ | |||
+ | ... | ||
+ | |||
+ | </ | ||
+ | |||
+ | It is important to remember that all lexers must provide a list " | ||
+ | of the possible token names that can be produced by the lexer. The tokens list is also | ||
+ | used by the '' | ||
+ | |||
+ | For example, the list of tokens to be defined can be written as follows: | ||
+ | <code python> | ||
+ | tokens = ( | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ) | ||
+ | </ | ||
+ | ===== Tokens ===== | ||
+ | |||
+ | As we said before, each token is specified by writing a regular expression rule | ||
+ | compatible with python' | ||
+ | with the prefix '' | ||
+ | |||
+ | I remind you that simple tokens, can be specified as strings such as follows: | ||
+ | |||
+ | <code python> | ||
+ | t_PLUS = r' | ||
+ | </ | ||
+ | |||
+ | If some kind of action needs to be performed, a token rule can be specified as a function. | ||
+ | For example: | ||
+ | |||
+ | <code python> | ||
+ | def t_NUMBER(t): | ||
+ | r' | ||
+ | t.value = int(t.value) | ||
+ | return t | ||
+ | </ | ||
+ | |||
+ | this rule matches number and converts the string into a Python integer. | ||
+ | |||
+ | When a function is used, the regular expression rule is specified in the | ||
+ | function documentation string. The function always takes a single argument which | ||
+ | is an instance of LexToken. This object has different attributes: | ||
+ | |||
+ | - t.type which is the token type (as a string) | ||
+ | - t.value which is the lexeme (the actual text matched) | ||
+ | - t.lineno which is the current line number | ||
+ | - t.lexpos which is the position of the token relative to the beginning of the input text. | ||
+ | |||
+ | By default, t.type is set to the name following the '' | ||
+ | |||
+ | It is important to remember that all tokens defined by functions are added in the same | ||
+ | order as they appear in the lexer file | ||
+ | |||
+ | In some applications, | ||
+ | complex regular expression rules. In this case is important to use the '' | ||
+ | which can be used by importing the token module: | ||
+ | |||
+ | <code python> | ||
+ | from ply.lex import TOKEN | ||
+ | </ | ||
+ | |||
+ | In the following example (from the exercise #2 of the Laboratory #2) we are using a '' | ||
+ | |||
+ | <code python> | ||
+ | data_table = [' | ||
+ | @TOKEN("" | ||
+ | def t_table(self, | ||
+ | self.numTables = self.numTables + 1 | ||
+ | self.writeOut(t.value) | ||
+ | </ | ||
+ | ===== States ===== | ||
+ | |||
+ | In advanced parsing applications, | ||
+ | For instance, you may want the occurrence of a certain token to trigger a | ||
+ | different kind of lexing. PLY supports a feature that allows the underlying lexer | ||
+ | to be put into a series of different states. Each state can have its own tokens, | ||
+ | lexing rules, ecc... | ||
+ | |||
+ | To define a new lexing state, it must first be declared inside the " | ||
+ | in your lex file or class. For example: | ||
+ | |||
+ | <code python> | ||
+ | states = ( | ||
+ | (' | ||
+ | (' | ||
+ | ) | ||
+ | </ | ||
+ | |||
+ | This declaration declares two states, ' | ||
+ | States may be of two types; ' | ||
+ | |||
+ | == Exclusive == | ||
+ | |||
+ | '' | ||
+ | That is, lex will only return tokens and apply rules defined specifically for that state. | ||
+ | |||
+ | |||
+ | == Inclusive == | ||
+ | |||
+ | '' | ||
+ | Thus, lex will return both the tokens defined by default in addition to those defined for the inclusive state. | ||
+ | |||
+ | |||
+ | By default, lexing operates in the '' | ||
+ | the normally defined tokens. For users who aren't using different states, | ||
+ | this fact is completely transparent. If, during lexing or parsing, you want | ||
+ | to change the lexing state, use the begin() method. For example: | ||
+ | <code python> | ||
+ | def t_begin_foo(t): | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | To get out of a state, you use begin() to switch back to the initial state. | ||
+ | For example: | ||
+ | <code python> | ||
+ | def t_foo_end(t): | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | The " | ||
+ | |||
+ | <file python Practices/ | ||
+ | import ply.lex as lex | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | from ply.lex import TOKEN | ||
+ | |||
+ | class MyLexer(): | ||
+ | |||
+ | |||
+ | # CONSTRUCTOR | ||
+ | def __init__(self): | ||
+ | print(' | ||
+ | self.lexer = lex.lex(module=self) | ||
+ | self.lexer.begin(' | ||
+ | |||
+ | # DESTRUCTOR | ||
+ | def __del__(self): | ||
+ | print(' | ||
+ | |||
+ | # list of TOKENS | ||
+ | tokens = [ | ||
+ | | ||
+ | ' | ||
+ | ' | ||
+ | |||
+ | ] | ||
+ | | ||
+ | # list of STATES -> used only the one to catch comments | ||
+ | states = ( | ||
+ | (' | ||
+ | ) | ||
+ | |||
+ | # tokens DEFINITION | ||
+ | |||
+ | def t_ID(self, | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_EURO(self, | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_INT(self, | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_CM(self, | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_S(self, t): | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_C(self, | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_SEP(self, | ||
+ | r' | ||
+ | return t | ||
+ | |||
+ | def t_nl(self, | ||
+ | r' | ||
+ | pass | ||
+ | |||
+ | # every symbol that doesn' | ||
+ | def t_error(self, | ||
+ | r' | ||
+ | print(" | ||
+ | return t | ||
+ | | ||
+ | # COMMENT STATE | ||
+ | | ||
+ | def t_INITIAL_comm(self, | ||
+ | r' | ||
+ | self.lexer.begin(' | ||
+ | |||
+ | def t_COMMENT_end(self, | ||
+ | ' | ||
+ | self.lexer.begin(' | ||
+ | |||
+ | def t_COMMENT_body(self, | ||
+ | r' | ||
+ | pass | ||
+ | |||
+ | def t_COMMENT_nl(self, | ||
+ | r' | ||
+ | pass | ||
+ | |||
+ | def t_COMMENT_error(self, | ||
+ | r' | ||
+ | print(" | ||
+ | return t | ||
+ | </ | ||
+ | |||
+ | ====== Parser ====== | ||
+ | '' | ||
+ | |||
+ | It represents the class used to recognize the grammar rules defined to develop | ||
+ | the laboratories. | ||
+ | |||
+ | Inside the class you can find: | ||
+ | |||
+ | == List of grammar rules == | ||
+ | <code python> | ||
+ | # Each grammar rule is defined by a Python function where the docstring to that function | ||
+ | # contains the appropriate context-free grammar specification. | ||
+ | # The statements that make up the function body implement the semantic actions of the rule. | ||
+ | # Each function accepts a single argument p that is a sequence containing the values | ||
+ | # of each grammar symbol in the corresponding rule. | ||
+ | |||
+ | # The values of p[i] are mapped to grammar symbols as shown here: | ||
+ | |||
+ | def p_expression_plus(p): | ||
+ | ' | ||
+ | # | ||
+ | # p[0] | ||
+ | p[0] = p[1] + p[3] | ||
+ | |||
+ | </ | ||
+ | |||
+ | == List of precedence rules == | ||
+ | |||
+ | <code python> | ||
+ | # For your grammar rules in some case you have to define precedence rules in order to avoid conflicts; | ||
+ | # to reach this purpose you have to define a list of precedences. For example: | ||
+ | |||
+ | precedence = ( | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | ) | ||
+ | </ | ||
+ | |||
+ | == Error handling grammar rule == | ||
+ | |||
+ | <code python> | ||
+ | def p_error(p): | ||
+ | if p: | ||
+ | print(" | ||
+ | # Just discard the token and tell the parser it's okay. | ||
+ | parser.errok() | ||
+ | else: | ||
+ | print(" | ||
+ | </ | ||
+ | ===== Grammar Rules ===== | ||
+ | |||
+ | Every grammar rule must begin with the prefix '' | ||
+ | mechanism used for the lexer to recognize the token whose name starts with prefix '' | ||
+ | We take into consideration the parser class definition of the '' | ||
+ | |||
+ | <file python Practices/ | ||
+ | from myLexer import * | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | |||
+ | |||
+ | class MyParser: | ||
+ | |||
+ | # CONSTRUCTOR | ||
+ | def __init__(self, | ||
+ | print(" | ||
+ | self.parser = yacc.yacc(module=self) | ||
+ | self.lexer = lexer | ||
+ | |||
+ | # DESTRUCTOR | ||
+ | def __del__(self): | ||
+ | print(' | ||
+ | |||
+ | | ||
+ | tokens = MyLexer.tokens | ||
+ | |||
+ | |||
+ | # GRAMMAR START | ||
+ | def p_expr_list(self, | ||
+ | ''' | ||
+ | expr_list : expr_list expr | ||
+ | | expr | ||
+ | ''' | ||
+ | |||
+ | def p_expr(self, | ||
+ | ''' | ||
+ | expr : e EQUAL | ||
+ | ''' | ||
+ | |||
+ | print(" | ||
+ | |||
+ | def p_e(self, | ||
+ | ''' | ||
+ | e : e PLUS t | ||
+ | | e MINUS t | ||
+ | | t | ||
+ | ''' | ||
+ | |||
+ | def p_t(self, | ||
+ | ''' | ||
+ | t : OBRACKET e CBRACKET | ||
+ | | NUMBER | ||
+ | ''' | ||
+ | |||
+ | def p_error(self, | ||
+ | ''' | ||
+ | ''' | ||
+ | </ | ||
+ | |||
+ | The first rule defined in the yacc specification determines the starting grammar | ||
+ | symbol (in this case, the '' | ||
+ | Whenever the starting rule is reduced by the parser and no more input is | ||
+ | available, parsing stops and the final value is returned | ||
+ | (this value will be whatever the top-most rule placed in p[0]). | ||
+ | |||
+ | For tokens, the " | ||
+ | attribute assigned in the lexer module. For non-terminals, | ||
+ | determined by whatever is placed in p[0] when rules are reduced. | ||
+ | This value can be anything at all. | ||
+ | |||
+ | '' | ||
+ | |||
+ | <code python> | ||
+ | def p_empty(p): | ||
+ | 'empty :' | ||
+ | pass | ||
+ | </ | ||
+ | |||
+ | Now to use the empty production, simply use ' | ||
+ | |||
+ | <code python> | ||
+ | def p_optitem(p): | ||
+ | ' | ||
+ | ' | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | ===== Precedence rules ===== | ||
+ | |||
+ | In many situations, it is extremely difficult or awkward to write grammars | ||
+ | without conflicts. | ||
+ | |||
+ | When an ambiguous grammar is given to '' | ||
+ | '' | ||
+ | A shift/ | ||
+ | whether or not to reduce a rule or shift a symbol on the parsing stack. | ||
+ | |||
+ | By default, all shift/ | ||
+ | |||
+ | To resolve ambiguity, especially in expression grammars, '' | ||
+ | individual tokens to be assigned a precedence level and associativity. | ||
+ | This is done by adding a variable precedence to the grammar file of the parser like this: | ||
+ | |||
+ | <code python> | ||
+ | precedence = ( | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | ) | ||
+ | </ | ||
+ | |||
+ | This declaration specifies that PLUS/MINUS have the same precedence level and | ||
+ | are left-associative and that TIMES/ | ||
+ | left-associative. Within the precedence declaration, | ||
+ | lowest to highest precedence. Thus, this declaration specifies that TIMES/ | ||
+ | have higher precedence than PLUS/MINUS (since they appear later in the | ||
+ | precedence specification). | ||
+ | |||
+ | Now, in the grammar file, we can write our unary minus rule like this: | ||
+ | |||
+ | <code python> | ||
+ | def p_expr_uminus(p): | ||
+ | ' | ||
+ | p[0] = -p[2] | ||
+ | </ | ||
+ | |||
+ | In this case, '' | ||
+ | it to that of '' | ||
+ | |||
+ | In the following example taken from the '' | ||
+ | precedence rules and also the usage of the keyword " | ||
+ | |||
+ | <code python> | ||
+ | from myLexer import * | ||
+ | import ply.yacc as yacc | ||
+ | |||
+ | class MyParser: | ||
+ | |||
+ | # CONSTRUCTOR | ||
+ | def __init__(self): | ||
+ | print(" | ||
+ | self.parser = yacc.yacc(module=self) | ||
+ | |||
+ | # DESTRUCTOR | ||
+ | def __del__(self): | ||
+ | print(' | ||
+ | | ||
+ | tokens = MyLexer.tokens | ||
+ | |||
+ | precedence = ( | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | (' | ||
+ | ) | ||
+ | |||
+ | # GRAMMAR START | ||
+ | |||
+ | ... | ||
+ | |||
+ | def p_if_stmt(self, | ||
+ | ''' | ||
+ | if_stmt : IF RBOPEN cond RBCLOSED stmt %prec IFX | ||
+ | | IF RBOPEN cond RBCLOSED stmt ELSE stmt | ||
+ | ''' | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ====== Compilation ====== | ||
+ | |||
+ | To compile correctly your programs you have to create the lexer and parser objects (when they | ||
+ | are used, if they don't everything is inside the main function so you don't have | ||
+ | to instantiate anything); for example: | ||
+ | |||
+ | <code python> | ||
+ | # create objects MY LEXER and MY PARSER | ||
+ | myLex = MyLexer() | ||
+ | myPars = MyParser(myLex) | ||
+ | </ | ||
+ | |||
+ | In the example above after creating it, the lexer is passed in the parser | ||
+ | constructor (this is important to let the parser see through the lexer tokens). | ||
+ | |||
+ | To build the lexer, the function '' | ||
+ | |||
+ | This function uses Python reflection (or introspection) to read the | ||
+ | regular expression rules out of the calling context and build the lexer. | ||
+ | |||
+ | <code python> | ||
+ | |||
+ | # create object MY LEXER inside the main function | ||
+ | myLex = MyLexer() | ||
+ | |||
+ | # build the lexer by calling the lex.lex() method inside the myLexer class | ||
+ | def __init__(self): | ||
+ | print(' | ||
+ | |||
+ | self.lexer = lex.lex(module=self) | ||
+ | # initialize all parameters | ||
+ | self.file = open(' | ||
+ | |||
+ | </ | ||
+ | |||
+ | Once the lexer has been built, two methods can be used to control the lexer. | ||
+ | |||
+ | <code python> | ||
+ | lexer.input(data) # Reset the lexer and store a new input string. | ||
+ | |||
+ | lexer.token() # Return the next token. Returns a special LexToken instance | ||
+ | # on success or None if the end of the input text has been reached. Sometimes | ||
+ | # is used indirectly in the for cycle "for token in lexer:" | ||
+ | </ | ||
+ | |||
+ | The following example shows how the main function looks like using only the lexer: | ||
+ | |||
+ | <file python Laboratories/ | ||
+ | import ply.lex as lex | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | |||
+ | from myLexer import * | ||
+ | |||
+ | # create object MY LEXER | ||
+ | |||
+ | myLex = MyLexer() | ||
+ | |||
+ | # reading INPUT FILE | ||
+ | |||
+ | myFile = open(sys.argv[1]) | ||
+ | |||
+ | lexer = myLex.lexer | ||
+ | |||
+ | with myFile as fp: | ||
+ | for line in fp: | ||
+ | try: | ||
+ | lexer.input(line) | ||
+ | |||
+ | for token in lexer: | ||
+ | pass | ||
+ | |||
+ | except EOFError: | ||
+ | break | ||
+ | </ | ||
+ | |||
+ | To build the parser, call the '' | ||
+ | |||
+ | <code python> | ||
+ | # create object MY PARSER inside the main function | ||
+ | myLex = MyLexer() | ||
+ | myPars = MyParser() | ||
+ | |||
+ | # build the parser by calling the yacc.yacc() method inside the myParser class | ||
+ | def __init__(self): | ||
+ | print(" | ||
+ | self.parser = yacc.yacc(module=self) | ||
+ | </ | ||
+ | |||
+ | This function looks at the module and attempts to construct all of the LR | ||
+ | parsing tables for the grammar you have specified. The first time '' | ||
+ | is invoked, you will get a message such as this: | ||
+ | |||
+ | < | ||
+ | Generating LALR tables | ||
+ | calc > | ||
+ | </ | ||
+ | |||
+ | Since table construction is relatively expensive (especially for large grammars), | ||
+ | the resulting parsing table is written to a file called '' | ||
+ | In addition, a debugging file called '' | ||
+ | On subsequent executions, yacc will reload the table from '' | ||
+ | it has detected a change in the underlying grammar (in which case the tables | ||
+ | and parsetab.py file are regenerated). Both of these files are written to the | ||
+ | same directory as the module in which the parser is specified. The name of the | ||
+ | parsetab module can be changed using the tabmodule keyword argument to '' | ||
+ | |||
+ | Once the parser has been built you can use the parse method in order to control | ||
+ | the parser. | ||
+ | <code python> | ||
+ | parser.parse(data) | ||
+ | </ | ||
+ | |||
+ | The following example shows how the main function looks like using both lexer and parser: | ||
+ | |||
+ | <file python Laboratories/ | ||
+ | import ply.lex as lex | ||
+ | import ply.yacc as yacc | ||
+ | import sys | ||
+ | |||
+ | from myLexer import * | ||
+ | from myParser import * | ||
+ | |||
+ | # create objects MY LEXER and MY PARSER | ||
+ | myLex = MyLexer() | ||
+ | myPars = MyParser() | ||
+ | |||
+ | lex = myLex.lexer | ||
+ | parser = myPars.parser | ||
+ | |||
+ | # reading INPUT FILE | ||
+ | |||
+ | myFile = open(sys.argv[1]) | ||
+ | parser.parse(myFile.read()) | ||
+ | |||
+ | </ |
If you found any error, or if you want to partecipate to the editing of this wiki, please contact: admin [at] skenz.it
You can reuse, distribute or modify the content of this page, but you must cite in any document (or webpage) this url: https://www.skenz.it/compilers/ply?do=diff
/web/htdocs/www.skenz.it/home/data/pages/compilers/ply.txt · Last modified: 2024/04/08 22:34 by 127.0.0.1