18-04-2011, 10:16 AM
[attachment=12268]
Using JavaCC
Automating Lexical Analysis Overall picture
Building Faster Scanners from the DFA
Table-driven recognizers waste a lot of effort
• Read (& classify) the next character
• Find the next state
• Assign to the state variable
• Branch back to the top
We can do better
• Encode state & actions in the code
• Do transition tests locally
• Generate ugly, spaghetti-like code
(it is OK, this is automatically generated code)
• Takes (many) fewer operations per input character
Inside lexical analyzer generator
• How does a lexical analyzer work?
– Get input from user who defines tokens in the form that is equivalent to regular grammar
– Turn the regular grammar into a NFA
– Convert the NFA into DFA
– Generate the code that simulates the DFA
Flow for Using JavaCC
Structure of a JavaCC File
• A JavaCC file is composed of 3 portions:
– Options
– Class declaration
– Specification for lexical analysis (tokens), and specification for syntax analysis.
• For the very first example of JavaCC, let's recognize two tokens: ``+'', and numerals.
• Use an editor to edit and save it with file name numeral.jj
Using javaCC for lexical analysis
• javacc is a “top-down” parser generator.
• Some parser generators (such as yacc , bison, and JavaCUP) need a separate lexical-analyzer generator.
• With javaCC, you can specify the tokens within the parser generator.
Example File
Options
• The options portion is optional and is omitted in the previous example.
• STATIC is a boolean option whose default value is true. If true, all methods and class variables are specified as static in the generated parser and token manager.
– This allows only one parser object to be present, but it improves the performance of the parser.
– To perform multiple parses during one run of your Java program, you will have to call the ReInit() method to reinitialize your parser if it is static.
– If the parser is non-static, you may use the "new" operator to construct as many parsers as you wish. These can all be used simultaneously from different threads.
Start
Compilation
javaCC specification of a lexer
A Full Example
See the sample file
Dealing with errors
• Error reporting: 123e+q
• Could consider it an invalid token (lexical error) or
• return a sequence of valid tokens
– 123, e, +, q,
– and let the parser deal with the error.
Lexical error correction?
• Sometimes interaction between the Scanner and parser can help
– especially in a top-down (predictive) parse
– The parser, when it calls the scanner, can pass as an argument the set of allowable tokens.
– Suppose the Scanner sees calss in a context where only a top-level definition is allowed.
Same symbol, different meaning.
• How can the scanner distinguish between binary minus and unary minus?
– x = -a; vs x = 3 – a;
Scanner “troublemakers”
• Unclosed strings
• Unclosed comments.
JavaCC as a Parsing Tool
Javacc Overview
• Generates a top down parser.
Could be used for generating a Prolog parser which is in LL.
• Generates a parser in Java.
Hence can be integrated with any Java based Prolog compiler/interpreter to continue our example.
• Token specification and grammar specification structures are in the same file => easier to debug.