Role-of-lexical-analysis

Lexical Analysis
Lexical
Analyzer

Lexical Analysis
Lexical
Analyzer
Parser

Lexical Analysis
Source
Program
Lexical
Analyzer
Parser

Lexical Analysis
Source
Program
Lexical
Analyzer
Parser
Symbol
Table

Lexical Analysis
Source
Program
Lexical
Analyzer
get next token
Parser
Symbol
Table

Lexical Analysis
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table

Lexical Analysis
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Lexical Analysis
• It is the first phase of a compiler.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Lexical Analysis
• Its main task is to read input characters and produce tokens.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Lexical Analysis
• “get next token” is a command sent from the parser to the lexical analyzer.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Lexical Analysis
• On receipt of the command, the lexical analyzer scans the input until it
determines the next token, and returns it.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Lexical Analysis
• It skips comments and whitspaces while creating these tokens.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Lexical Analysis
• It skips comments and whitspaces while creating these tokens.
• If any error is present then LA will correlate that error with source file and
line number.
Source
Program
Lexical
Analyzer
token
get next token
Parser
Symbol
Table
Rest of compiler

Issues in Lexical Analysis
• Simpler Design

• Simpler Design
• Compiler Efficiency is Improved

• Simpler Design
• Compiler Efficiency is Improved
• Compiler portability is enhanced

Tokens, patterns, and lexemes
• A TOKEN is a set of strings over the source
alphabet.

alphabet.
• A PATTERN is a rule that describes that set.

alphabet.
• A LEXEME is a sequence of characters
matching that pattern.

alphabet.
• E.g. in Pascal, for the statement
const pi = 3.1416;

alphabet.
• E.g. in Pascal, for the statement
const pi = 3.1416;
• The substring pi is a lexeme for the token
identifier

Example tokens, lexemes, patterns
Token Sample Lexemes Informal description of pattern
if if if
While While while
Relation <, <=, = , <>, > >= < or <= or = or <> or > or >=
Id count, sun, i, j, pi, D2 Letter followed by letters and digits
Num 0, 12, 3.1416, 6.02E23 Any numeric constant

Tokens
• Together, the complete set of tokens form the set of terminal
symbols used in the grammar for the parser.
• In most languages, the tokens fall into these categories:
– Keywords
– Operators
– Identifiers
– Constants
– Literal strings
– Punctuation

Token attributes
• If there is more than one lexeme for a token,
we have to save additional information about
the token.
• Example: the token number matches lexemes
10 and 20.
• Code generation needs the actual number, not
just the token.
• With each token, we associate ATTRIBUTES.
Normally just a pointer into the symbol table.

Example attributes
• For C source code
E = M * C * * 2

Example attributes
• For C source code
E = M * C * * 2
• We have token/attribute pairs
<ID, ptr to symbol table entry for E>
<Assign_op, null>
<ID, ptr to symbol table entry for M>
<Mult_op,null>
<ID, ptr to symbol table entry for C>
<exp_op, null>
<num, integer value 2>

Lexical errors
• When errors occur, we could just crash
• It is better to print an error message then
continue.
• Possible error recovery(Panic Mode) actions are:
– Delete a character
– Insert a missing character
– Replace an incorrect character by a correct character
– Transpose adjacent characters

Role-of-lexical-analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Role-of-lexical-analysis

Similar to Role-of-lexical-analysis (20)

Recently uploaded

Recently uploaded (20)

Role-of-lexical-analysis