Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Finaal application on regular expression


Published on

Applications of Regular Expressions in Linux,Search engines and Compiler design lexical Analysis.

Published in: Education
  • Login to see the comments

Finaal application on regular expression

  1. 1. APPLICATION OF REGULAR EXPRESSION Ankit G – 014 Gagan – 034 Nikhil R.K- 060 Parashuram - 065
  2. 2. •A regular expression (regex) describes a pattern to match multiple input strings. •Regular expressions descend from a fundamental concept in Computer Science called finite automata theory •Regular expressions are endemic to Unix •Some utilities/programs that use them: – vi, ed, sed, and emacs – awk, tcl, perl and Python – grep, egrep, fgrep – compilers •The simplest regular expression is a string of literal characters to match. •The string matches the regular expression if it contains the substring. What is a Regular Expression? INTRODUCTION
  3. 3. Application in Linux The “egrep” Tool
  4. 4. Copyright © 2007 by Adam Webber Text File Search • Unix tool: egrep • Searches a text file for lines that contain a substring matching a specified pattern • Echoes all such lines to standard output
  5. 5. In linux operating System:  Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and, to a more limited extent, vi.  Sed also understands something called addresses. Addresses are either particular locations in a file or a range where a particular editing command should be applied. When Sed encounters no addresses, it performs its operations on every line in the file.
  6. 6.  Sed stands for stream editor is a stream oriented editor which was created exclusively for executing scripts. Thus all the input you feed into it passes through and goes to STDOUT and it does not change the input file.  Oracles implementation is the extension of the POSIX (Portable Operating system for UNIX)
  7. 7. Editing Commands COMMANDS ACTION Insert i, a I, A o, O Insert text before, after cursor Insert text before beginning, after end of line Open new line for text below, above cursor
  8. 8. Editing Commands COMMANDS ACTION Change r cw c Replace character Change word Change current line cmotion C R s Change text between the cursor and the target of motion Change to end of line Type over (overwrite) characters Substitute: delete character and insert new text S Substitute: delete current line and insert new text
  9. 9. Application in Search Engine  One use of regular expressions that used to be very common was in web search engines.  Archie, one of the first search engines, used regular expressions exclusively to search through a database of filenames on servers.  Regular expressions were chosen for these early search engines because of both their power and easy implementation.
  10. 10.  In the case of a search engine, the strings input to the regular expression would be either whole web pages or a pre-computed index of a web page that holds only the most important information from that web page.  A query such as regular expression could be translated into the following regular expression. (Σ∗regularΣ∗expressionΣ∗ )∗∪ (Σ∗expressionΣ∗regularΣ∗ )∗ Σ, then, of course, would be the set of all characters in the character encoding used with this search engine.
  11. 11.  Regular expressions are not used anymore in the large web search engines because with the growth of the web it became impossibly slow to use regular expressions. They are however still used in many smaller search engines such as a find/replace tool in a text editor or tools such as grep.
  12. 12. In web application String matching is used
  13. 13. Regular Expressions in Lexical Analysis  To perform lexical analysis, two components are required: a scanner and a tokenizer.  The purpose of tokenization is to categorize the lexemes found in a string to sort them by meaning.  The process can be considered a sub-task of parsing input.
  14. 14.  For example, the C programming language could contain tokens such as numbers, string constants, characters, identifiers (variable names), keywords, or operators.  We can simply define a set of regular expressions, each matching the valid set of lexemes that belong to this token type. This is the process of scanning.
  15. 15.  This process can be quite complex and may require more than one pass to complete. Another option is to use a process known as backtracking  For example, to determine if a lexeme is a valid identifier in C, we could use the following regular expression: [a-zA-Z ][a-zA-Z 0-9]∗ This regular expression says that identifiers must begin with a Roman letter or an underscore and may be followed by any number of letters, underscores, or numbers
  16. 16. CONCLUSION  Both regular expressions and finite-state automata represent regular languages.  The basic regular expression operations are: concatenation, union/disjunction, and Keene closure.  The regular expression language is a powerful pattern- matching tool.  Any regular expression can be automatically compiled into an NFA, to a DFA, and to a unique minimum-state DFA.  An FSA can use any set of symbols for its alphabet, including letters and words.
  17. 17. THANK YOU