SlideShare a Scribd company logo
1 of 14
PYTHON REGULAR EXPRESSIONS
John Zhang
Tuesday, December 11, 2012
Regular Expressions
• Regular expressions are a powerful string
manipulation tool
• All modern languages have similar library
packages for regular expressions
• Use regular expressions to:
– Search a string (search and match)
– Replace parts of a string (sub)
– Break stings into smaller pieces (split)
Regular Expression Python Syntax
• regular match:
Example: the regular expression “test” only
matches the string ‘test’
• [x] matches any one of a list of characters
Example: “*abc+” matches ‘a’,‘b’,or ‘c’
• [^x] matches any one character that is not
included in x
“*^abc+” matches any single character except
‘a’,’b’,or ‘c’
Regular Expressions Syntax
• “.” matches any single character
• Parentheses can be used for grouping by ()
Example: “(abc)+” matches ’abc’, ‘abcabc’,
‘abcabcabc’, etc.
• x|y matches x or y
Example: “this|that” matches ‘this’ and ‘that’,
but not ‘thisthat’.
Regular Expression Syntax
• x* matches zero or more x’s
“a*” matches ’’, ’a’, ’aa’, etc.
• x+ matches one or more x’s
“a+” matches ’a’,’aa’,’aaa’, etc.
• x? matches zero or one x’s
“a?” matches ’’ or ’a’ .
• x{m, n} matches i x‘s, where m<i< n
“a,2,3-” matches ’aa’ or ’aaa’
Regular Expression Syntax
• “d” matches any digit; “D” matches any non-digit
• “s” matches any whitespace character; “S”
matches any non-whitespace character
• “w” matches any alphanumeric character; “W”
matches any non-alphanumeric character
• “^” matches the beginning of the string; “$”
matches the end of the string
• “b” matches a word boundary; “B” matches
position that is not a word boundary
Search and Match
• The two basic functions are re.search and re.match
– Search looks for a pattern anywhere in a string
– Match looks for a match staring at the beginning
• Both return None if the pattern is not found (logical false)
and a “match object” if it is
pat = "a*b"
import re
matchObj = re.search(pat,"fooaaabcde")
if matchObj:
print “match successfully at %s” % matchObj.group(0)
Q: What’s a match object?
• A: an instance of the match class with the details of the match
result
pat = "a*b"
>>> r1 = re.search(pat,"fooaaabcde")
>>> r1.group() # group returns string matched
'aaab'
>>> r1.start() # index of the match start
3
>>> r1.end() # index of the match end
7
>>> r1.span() # tuple of (start, end)
(3, 7)
What got matched?
• Here’s a pattern to match simple email addresses
w+@(w+.)+(com|org|net|edu)
>>> pat1 = "w+@(w+.)+(com|org|net|edu)"
>>> r1 = re.match(pat1,“qzhang@pku.cn.edu")
>>> r1.group()
'qzhang@pku.cn.edu’

• We might want to extract the pattern parts, like the
email name and host
What got matched?
• We can put parentheses around groups we want to be
able to reference
>>> pat2 = "(w+)@((w+.)+(com|org|net|edu))"
>>> r2 = re.match(pat2,"qzhang@pku.cn.edu")
>>> r2.group(1)
‘qzhang'
>>> r2.group(2)
‘pku.cn.edu'
>>> r2.groups()
r2.groups()
(‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’)

• Note that the ‘groups’ are numbered in a preorder
traversal of the forest
What got matched?
• We can ‘label’ the groups as well…
>>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"qzhang@pku.cn.edu")
>>> r3.group('name')
‘qzhang'
>>> r3.group('host')
‘pku.cn.edu’

• And reference the matching parts by the labels
More re functions
• re.split() is like split but can use patterns
>>> re.split("W+", “This... is a test, short and sweet, of split().”)
*'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+

• re.sub substitutes one string for a pattern
>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')
'black socks and black shoes’

• re.findall() finds al matches
>>> re.findall("d+”,"12 dogs,11 cats, 1 egg")
*'12', '11', ’1’+
Compiling regular expressions
• If you plan to use a re pattern more than once,
compile it to a re object
• Python produces a special data structure that
speeds up matching
>>> capt3 = re.compile(pat3)
>>> cpat3
<_sre.SRE_Pattern object at 0x2d9c0>
>>> r3 = cpat3.search("qzhang@pku.cn.edu")
>>> r3
<_sre.SRE_Match object at 0x895a0>
>>> r3.group()
'qzhang@pku.cn.edu'
Pattern object methods
• There are methods defined for a pattern object that
parallel the regular expression functions, e.g.,
– match
– search
– split
– findall
– sub

More Related Content

What's hot

Java: Regular Expression
Java: Regular ExpressionJava: Regular Expression
Java: Regular ExpressionMasudul Haque
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regexwayn
 
Python- Regular expression
Python- Regular expressionPython- Regular expression
Python- Regular expressionMegha V
 
Regular Expression
Regular ExpressionRegular Expression
Regular ExpressionBharat17485
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressionsmussawir20
 
Strings in Python
Strings in PythonStrings in Python
Strings in Pythonnitamhaske
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
Regular Expression
Regular ExpressionRegular Expression
Regular ExpressionLambert Lum
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Regular Expressions in Java
Regular Expressions in JavaRegular Expressions in Java
Regular Expressions in JavaOblivionWalker
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expressionGagan019
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular ExpressionsOCSI
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsDanny Bryant
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsBrij Kishore
 

What's hot (20)

Java: Regular Expression
Java: Regular ExpressionJava: Regular Expression
Java: Regular Expression
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
16 Java Regex
16 Java Regex16 Java Regex
16 Java Regex
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Python- Regular expression
Python- Regular expressionPython- Regular expression
Python- Regular expression
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
 
Strings in Python
Strings in PythonStrings in Python
Strings in Python
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Regular Expressions in Java
Regular Expressions in JavaRegular Expressions in Java
Regular Expressions in Java
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular Expressions
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
 
Strings in python
Strings in pythonStrings in python
Strings in python
 
Python strings
Python stringsPython strings
Python strings
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 

Similar to Python advanced 2. regular expression in python

FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 
unit-4 regular expression.pptx
unit-4 regular expression.pptxunit-4 regular expression.pptx
unit-4 regular expression.pptxPadreBhoj
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsJesse Anderson
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP StringsAhmed Swilam
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Contextlichtkind
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeBertram Ludäscher
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneCarl Brown
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersGiovanni924
 
Switching from java to groovy
Switching from java to groovySwitching from java to groovy
Switching from java to groovyPaul Woods
 
07. Java Array, Set and Maps
07.  Java Array, Set and Maps07.  Java Array, Set and Maps
07. Java Array, Set and MapsIntro C# Book
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 

Similar to Python advanced 2. regular expression in python (20)

P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
P3 2017 python_regexes
P3 2017 python_regexesP3 2017 python_regexes
P3 2017 python_regexes
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regex Basics
Regex BasicsRegex Basics
Regex Basics
 
unit-4 regular expression.pptx
unit-4 regular expression.pptxunit-4 regular expression.pptx
unit-4 regular expression.pptx
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
Regex 101
Regex 101Regex 101
Regex 101
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
1 the ruby way
1   the ruby way1   the ruby way
1 the ruby way
 
Perl 6 in Context
Perl 6 in ContextPerl 6 in Context
Perl 6 in Context
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
 
Using Regular Expressions and Staying Sane
Using Regular Expressions and Staying SaneUsing Regular Expressions and Staying Sane
Using Regular Expressions and Staying Sane
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammers
 
Switching from java to groovy
Switching from java to groovySwitching from java to groovy
Switching from java to groovy
 
4.1 PHP Arrays
4.1 PHP Arrays4.1 PHP Arrays
4.1 PHP Arrays
 
07. Java Array, Set and Maps
07.  Java Array, Set and Maps07.  Java Array, Set and Maps
07. Java Array, Set and Maps
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 

More from John(Qiang) Zhang

A useful tools in windows py2exe(optional)
A useful tools in windows py2exe(optional)A useful tools in windows py2exe(optional)
A useful tools in windows py2exe(optional)John(Qiang) Zhang
 
Python advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structuresPython advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structuresJohn(Qiang) Zhang
 
Python advanced 3.the python std lib by example – system related modules
Python advanced 3.the python std lib by example – system related modulesPython advanced 3.the python std lib by example – system related modules
Python advanced 3.the python std lib by example – system related modulesJohn(Qiang) Zhang
 
Python advanced 3.the python std lib by example – application building blocks
Python advanced 3.the python std lib by example – application building blocksPython advanced 3.the python std lib by example – application building blocks
Python advanced 3.the python std lib by example – application building blocksJohn(Qiang) Zhang
 
Python advanced 1.handle error, generator, decorator and decriptor
Python advanced 1.handle error, generator, decorator and decriptor Python advanced 1.handle error, generator, decorator and decriptor
Python advanced 1.handle error, generator, decorator and decriptor John(Qiang) Zhang
 
Python advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithmPython advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithmJohn(Qiang) Zhang
 

More from John(Qiang) Zhang (11)

Git and github introduction
Git and github introductionGit and github introduction
Git and github introduction
 
Python testing
Python  testingPython  testing
Python testing
 
Profiling in python
Profiling in pythonProfiling in python
Profiling in python
 
Introduction to jython
Introduction to jythonIntroduction to jython
Introduction to jython
 
Introduction to cython
Introduction to cythonIntroduction to cython
Introduction to cython
 
A useful tools in windows py2exe(optional)
A useful tools in windows py2exe(optional)A useful tools in windows py2exe(optional)
A useful tools in windows py2exe(optional)
 
Python advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structuresPython advanced 3.the python std lib by example –data structures
Python advanced 3.the python std lib by example –data structures
 
Python advanced 3.the python std lib by example – system related modules
Python advanced 3.the python std lib by example – system related modulesPython advanced 3.the python std lib by example – system related modules
Python advanced 3.the python std lib by example – system related modules
 
Python advanced 3.the python std lib by example – application building blocks
Python advanced 3.the python std lib by example – application building blocksPython advanced 3.the python std lib by example – application building blocks
Python advanced 3.the python std lib by example – application building blocks
 
Python advanced 1.handle error, generator, decorator and decriptor
Python advanced 1.handle error, generator, decorator and decriptor Python advanced 1.handle error, generator, decorator and decriptor
Python advanced 1.handle error, generator, decorator and decriptor
 
Python advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithmPython advanced 3.the python std lib by example – algorithm
Python advanced 3.the python std lib by example – algorithm
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Python advanced 2. regular expression in python

  • 1. PYTHON REGULAR EXPRESSIONS John Zhang Tuesday, December 11, 2012
  • 2. Regular Expressions • Regular expressions are a powerful string manipulation tool • All modern languages have similar library packages for regular expressions • Use regular expressions to: – Search a string (search and match) – Replace parts of a string (sub) – Break stings into smaller pieces (split)
  • 3. Regular Expression Python Syntax • regular match: Example: the regular expression “test” only matches the string ‘test’ • [x] matches any one of a list of characters Example: “*abc+” matches ‘a’,‘b’,or ‘c’ • [^x] matches any one character that is not included in x “*^abc+” matches any single character except ‘a’,’b’,or ‘c’
  • 4. Regular Expressions Syntax • “.” matches any single character • Parentheses can be used for grouping by () Example: “(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc. • x|y matches x or y Example: “this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.
  • 5. Regular Expression Syntax • x* matches zero or more x’s “a*” matches ’’, ’a’, ’aa’, etc. • x+ matches one or more x’s “a+” matches ’a’,’aa’,’aaa’, etc. • x? matches zero or one x’s “a?” matches ’’ or ’a’ . • x{m, n} matches i x‘s, where m<i< n “a,2,3-” matches ’aa’ or ’aaa’
  • 6. Regular Expression Syntax • “d” matches any digit; “D” matches any non-digit • “s” matches any whitespace character; “S” matches any non-whitespace character • “w” matches any alphanumeric character; “W” matches any non-alphanumeric character • “^” matches the beginning of the string; “$” matches the end of the string • “b” matches a word boundary; “B” matches position that is not a word boundary
  • 7. Search and Match • The two basic functions are re.search and re.match – Search looks for a pattern anywhere in a string – Match looks for a match staring at the beginning • Both return None if the pattern is not found (logical false) and a “match object” if it is pat = "a*b" import re matchObj = re.search(pat,"fooaaabcde") if matchObj: print “match successfully at %s” % matchObj.group(0)
  • 8. Q: What’s a match object? • A: an instance of the match class with the details of the match result pat = "a*b" >>> r1 = re.search(pat,"fooaaabcde") >>> r1.group() # group returns string matched 'aaab' >>> r1.start() # index of the match start 3 >>> r1.end() # index of the match end 7 >>> r1.span() # tuple of (start, end) (3, 7)
  • 9. What got matched? • Here’s a pattern to match simple email addresses w+@(w+.)+(com|org|net|edu) >>> pat1 = "w+@(w+.)+(com|org|net|edu)" >>> r1 = re.match(pat1,“qzhang@pku.cn.edu") >>> r1.group() 'qzhang@pku.cn.edu’ • We might want to extract the pattern parts, like the email name and host
  • 10. What got matched? • We can put parentheses around groups we want to be able to reference >>> pat2 = "(w+)@((w+.)+(com|org|net|edu))" >>> r2 = re.match(pat2,"qzhang@pku.cn.edu") >>> r2.group(1) ‘qzhang' >>> r2.group(2) ‘pku.cn.edu' >>> r2.groups() r2.groups() (‘qzhang', ' pku.cn.edu ', ‘cn.', 'edu’) • Note that the ‘groups’ are numbered in a preorder traversal of the forest
  • 11. What got matched? • We can ‘label’ the groups as well… >>> pat3 ="(?P<name>w+)@(?P<host>(w+.)+(com|org|net|edu))" >>> r3 = re.match(pat3,"qzhang@pku.cn.edu") >>> r3.group('name') ‘qzhang' >>> r3.group('host') ‘pku.cn.edu’ • And reference the matching parts by the labels
  • 12. More re functions • re.split() is like split but can use patterns >>> re.split("W+", “This... is a test, short and sweet, of split().”) *'This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’+ • re.sub substitutes one string for a pattern >>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’ • re.findall() finds al matches >>> re.findall("d+”,"12 dogs,11 cats, 1 egg") *'12', '11', ’1’+
  • 13. Compiling regular expressions • If you plan to use a re pattern more than once, compile it to a re object • Python produces a special data structure that speeds up matching >>> capt3 = re.compile(pat3) >>> cpat3 <_sre.SRE_Pattern object at 0x2d9c0> >>> r3 = cpat3.search("qzhang@pku.cn.edu") >>> r3 <_sre.SRE_Match object at 0x895a0> >>> r3.group() 'qzhang@pku.cn.edu'
  • 14. Pattern object methods • There are methods defined for a pattern object that parallel the regular expression functions, e.g., – match – search – split – findall – sub