Finaal application on regular expression

APPLICATION OF
REGULAR EXPRESSION
Ankit G – 014
Gagan – 034
Nikhil R.K- 060
Parashuram - 065

•A regular expression (regex) describes a pattern to match multiple input strings.
•Regular expressions descend from a fundamental concept in Computer Science
called finite automata theory
•Regular expressions are endemic to Unix
•Some utilities/programs that use them:
– vi, ed, sed, and emacs
– awk, tcl, perl and Python
– grep, egrep, fgrep
– compilers
•The simplest regular expression is a string of literal characters to match.
•The string matches the regular expression if it contains the substring.
What is a Regular Expression?
INTRODUCTION

Application in Linux
The “egrep” Tool

Copyright © 2007 by Adam Webber
Text File Search
• Unix tool: egrep
• Searches a text file for lines that contain
a substring matching a specified pattern
• Echoes all such lines to standard output

In linux operating System:
 Regular expressions are used by several different
Unix commands, including ed, sed, awk,
grep, and, to a more limited extent, vi.
 Sed also understands something called addresses.
Addresses are either particular locations in a file or
a range where a particular editing command
should be applied. When Sed encounters no
addresses, it performs its operations on every line
in the file.

 Sed stands for stream editor is a stream oriented
editor which was created exclusively for executing
scripts. Thus all the input you feed into it passes
through and goes to STDOUT and it does not change
the input file.
 Oracles implementation is the extension of the
POSIX
(Portable Operating system for UNIX)

Editing Commands
COMMANDS ACTION
Insert
i, a
I, A
o, O
Insert text before, after cursor
Insert text before beginning, after end of line
Open new line for text below, above cursor

Editing Commands
COMMANDS ACTION
Change
r
cw
c
Replace character
Change word
Change current line
cmotion
C
R
s
Change text between the cursor and the target
of motion
Change to end of line
Type over (overwrite) characters
Substitute: delete character and insert new text
S Substitute: delete current line and insert new text

Application in Search Engine
 One use of regular expressions that used to be very
common was in web search engines.
 Archie, one of the first search engines, used regular
expressions exclusively to search through a database
of filenames on servers.
 Regular expressions were chosen for these early
search engines because of both their power and easy
implementation.

 In the case of a search engine, the strings input to
the regular expression would be either whole web
pages or a pre-computed index of a web page that
holds only the most important information from
that web page.
 A query such as regular expression could be
translated into the following regular expression.
(Σ∗regularΣ∗expressionΣ∗ )∗∪
(Σ∗expressionΣ∗regularΣ∗ )∗ Σ, then, of course,
would be the set of all characters in the character
encoding used with this search engine.

 Regular expressions are not used anymore in the
large web search engines because with the growth of
the web it became impossibly slow to use regular
expressions. They are however still used in many
smaller search engines such as a find/replace tool in
a text editor or tools such as grep.

In web application String matching is used

Regular Expressions in Lexical Analysis
 To perform lexical analysis, two components are
required: a scanner and a tokenizer.
 The purpose of tokenization is to categorize the
lexemes found in a string to sort them by meaning.
 The process can be considered a sub-task of parsing
input.

 For example, the C programming language could
contain tokens such as numbers, string constants,
characters, identifiers (variable names), keywords, or
operators.
 We can simply define a set of regular expressions,
each matching the valid set of lexemes that belong to
this token type. This is the process of scanning.

 This process can be quite complex and may require
more than one pass to complete.
Another option is to use a process known as
backtracking
 For example, to determine if a lexeme is a valid
identifier in C, we could use the following regular
expression: [a-zA-Z ][a-zA-Z 0-9]∗ This regular
expression says that identifiers must begin with a
Roman letter or an underscore and may be followed
by any number of letters, underscores, or numbers

CONCLUSION
 Both regular expressions and finite-state automata
represent regular languages.
 The basic regular expression operations are:
concatenation, union/disjunction, and Keene closure.
 The regular expression language is a powerful pattern-
matching tool.
 Any regular expression can be automatically compiled
into an NFA, to a DFA, and to a unique minimum-state
DFA.
 An FSA can use any set of symbols for its alphabet,
including letters and words.

Finaal application on regular expression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Finaal application on regular expression

Similar to Finaal application on regular expression (20)

Recently uploaded

Recently uploaded (20)

Finaal application on regular expression