14. Lexical Analysis
• It is the first phase of a compiler.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler
15. Lexical Analysis
• It is the first phase of a compiler.
• Its main task is to read input characters and produce tokens.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler
16. Lexical Analysis
• It is the first phase of a compiler.
• Its main task is to read input characters and produce tokens.
• “get next token” is a command sent from the parser to the lexical analyzer.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler
17. Lexical Analysis
• It is the first phase of a compiler.
• Its main task is to read input characters and produce tokens.
• “get next token” is a command sent from the parser to the lexical analyzer.
• On receipt of the command, the lexical analyzer scans the input until it
determines the next token, and returns it.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler
18. Lexical Analysis
• It is the first phase of a compiler.
• Its main task is to read input characters and produce tokens.
• “get next token” is a command sent from the parser to the lexical analyzer.
• On receipt of the command, the lexical analyzer scans the input until it
determines the next token, and returns it.
• It skips comments and whitspaces while creating these tokens.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler
19. Lexical Analysis
• It is the first phase of a compiler.
• Its main task is to read input characters and produce tokens.
• “get next token” is a command sent from the parser to the lexical analyzer.
• On receipt of the command, the lexical analyzer scans the input until it
determines the next token, and returns it.
• It skips comments and whitspaces while creating these tokens.
• If any error is present then LA will correlate that error with source file and
line number.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler
21. Issues in Lexical Analysis
• Simpler Design
• Compiler Efficiency is Improved
22. Issues in Lexical Analysis
• Simpler Design
• Compiler Efficiency is Improved
• Compiler portability is enhanced
23. Tokens, patterns, and lexemes
• A TOKEN is a set of strings over the source
alphabet.
24. Tokens, patterns, and lexemes
• A TOKEN is a set of strings over the source
alphabet.
• A PATTERN is a rule that describes that set.
25. Tokens, patterns, and lexemes
• A TOKEN is a set of strings over the source
alphabet.
• A PATTERN is a rule that describes that set.
• A LEXEME is a sequence of characters
matching that pattern.
26. Tokens, patterns, and lexemes
• A TOKEN is a set of strings over the source
alphabet.
• A PATTERN is a rule that describes that set.
• A LEXEME is a sequence of characters
matching that pattern.
• E.g. in Pascal, for the statement
const pi = 3.1416;
27. Tokens, patterns, and lexemes
• A TOKEN is a set of strings over the source
alphabet.
• A PATTERN is a rule that describes that set.
• A LEXEME is a sequence of characters
matching that pattern.
• E.g. in Pascal, for the statement
const pi = 3.1416;
• The substring pi is a lexeme for the token
identifier
28. Example tokens, lexemes, patterns
Token Sample Lexemes Informal description of pattern
if if if
While While while
Relation <, <=, = , <>, > >= < or <= or = or <> or > or >=
Id count, sun, i, j, pi, D2 Letter followed by letters and digits
Num 0, 12, 3.1416, 6.02E23 Any numeric constant
29. Tokens
• Together, the complete set of tokens form the set of terminal
symbols used in the grammar for the parser.
• In most languages, the tokens fall into these categories:
– Keywords
– Operators
– Identifiers
– Constants
– Literal strings
– Punctuation
30. Token attributes
• If there is more than one lexeme for a token,
we have to save additional information about
the token.
• Example: the token number matches lexemes
10 and 20.
• Code generation needs the actual number, not
just the token.
• With each token, we associate ATTRIBUTES.
Normally just a pointer into the symbol table.
32. Example attributes
• For C source code
E = M * C * * 2
• We have token/attribute pairs
<ID, ptr to symbol table entry for E>
<Assign_op, null>
<ID, ptr to symbol table entry for M>
<Mult_op,null>
<ID, ptr to symbol table entry for C>
<exp_op, null>
<num, integer value 2>
33. Lexical errors
• When errors occur, we could just crash
• It is better to print an error message then
continue.
• Possible error recovery(Panic Mode) actions are:
– Delete a character
– Insert a missing character
– Replace an incorrect character by a correct character
– Transpose adjacent characters