SlideShare a Scribd company logo
1 of 35
Download to read offline
Text manipulation
               with/without parsec
      October 11, 2011 Vancouver Haskell UnMeetup

                            Tatsuhiro Ujihisa




Tuesday, October 11, 2011
• Tatsuhiro Ujihisa
               • @ujm
               • HootSuite Media inc
               • Osaka, Japan
               • Vim: 14
               • Haskell: 5
Tuesday, October 11, 2011
Topics
               • text manipulation functions with/
                     without parsec
               • parsec library
               • texts in Haskell
               • attoparsec library


Tuesday, October 11, 2011
Haskell for work
               • Something academical
               • Something methematical
               • Web app
               • Better shell scripting
               • (Improve yourself )

Tuesday, October 11, 2011
Text manipulation
               • The concept of text
               • String is [Char]
                • lazy
                • Pattern matching


Tuesday, October 11, 2011
Example: split
               • Ruby/Python example
                • 'aaa<>bb<>c<><>d'.split('<>')
                            ['aaa', 'bb', 'c', '', 'd']
               • Vim script example
                • split('aaa<>bb<>c<><>d', '<>')


Tuesday, October 11, 2011
split in Haskell
               • split :: String -> String -> [String]
                • split "aaa<>bb<>c<><>d" "<>"
                            ["aaa", "bb", "c", "", "d"]
                    • "aaa<>bb<>c<><>d" `split` "<>"



Tuesday, October 11, 2011
Design of split
               • split "aaa<>bb<>c<><>d" "<>"
               • "aaa" : split "bb<>c<><>d" "<>"
               • "aaa" : "bb" : split "c<><>d" "<>"
               • "aaa" : "bb" : "c" : split "<>d" "<>"
               • "aaa" : "bb" : "c" : "" : split "d" "<>"
               • "aaa" : "bb" : "c" : "" : "d" split "" "<>"
               • "aaa" : "bb" : "c" : "" : "d" : []
Tuesday, October 11, 2011
Design of split
               • split "aaa<>bb<>c<><>d" "<>"
               • "aaa" : split "bb<>c<><>d" "<>"




Tuesday, October 11, 2011
Design of split
               • split "aaa<>bb<>c<><>d" "<>"
               • split' "aaa<>bb<>c<><>d" "" "<>"
               • split' "aa<>bb<>c<><>d" "a" "<>"
               • split' "a<>bb<>c<><>d" "aa" "<>"
               • split' "<>bb<>c<><>d" "aaa" "<>"
               • "aaa" : split "bb<>c<><>d" "<>"
Tuesday, October 11, 2011
•   split "aaa<>bb<>c<><>d" "<>"

                                               •   split' "aaa<>bb<>c<><>d" "" "<>"

                                               •   split' "aa<>bb<>c<><>d" "a" "<>"

                                               •   split' "a<>bb<>c<><>d" "aa" "<>"

  1    split :: String -> String -> [String]   •   split' "<>bb<>c<><>d" "aaa" "<>"
  2
  3
       str `split` pat = split' str pat ""
                                               •   "aaa" : split "bb<>c<><>d" "<>"

  4    split' :: String -> String -> String -> [String]
  5    split' "" _ memo = [reverse memo]
  6    split' str pat memo = let (a, b) = splitAt (length pat) str in
  7    ______________________if a == pat
  8    _________________________then (reverse memo) : (b `split` pat)
  9    _________________________else split' (tail str) pat (head str : memo)



Tuesday, October 11, 2011
Another approach
               • Text.Parsec: v3
               • Text.ParserCombinators.Parsec: v2
               • Real World Haskell Parsec chapter
                • csv parser

Tuesday, October 11, 2011
Design of split
               • split "aaa<>bb<>c<><>d" "<>"
               • many of
                • any char except for the string of
                            "<>"
               • that separated by "<>" or the end
                     of string



Tuesday, October 11, 2011
1   import qualified Text.Parsec as P
2
3   str `split` pat = case P.parse (split' (P.string pat)) "split" str of
4   _______________________Right x -> x
5   split' pat = P.anyChar `P.manyTill` (P.eof P.<|> (P.try (P.lookAhead pat) >> return ())) `P.sepBy` pat




Tuesday, October 11, 2011
1   import qualified Text.Parsec as P
2
3   str `split` pat = case P.parse (split' (P.string pat)) "split" str of
4   _______________________Right x -> x
5   split' pat = P.anyChar `P.manyTill` (P.eof P.<|> (P.try (P.lookAhead pat) >> return ())) `P.sepBy` pat



       Any char

       Except for end of the string or the pattern to separate
                     (without consuming text)



Tuesday, October 11, 2011
1       import qualified Text.Parsec as P
  2
  3       main = do
  4        print $ abc1 "abc" -- True
  5        print $ abc1 "abcd" -- False
  6        print $ abc2 "abc" -- True
  7        print $ abc2 "abcd" -- False
  8
  9       abc1 str = str == "abc"
 10       abc2 str = case P.parse (P.string "abc" >> P.eof ) "abc" str of
 11                Right _ -> True
 12                Left _ -> False


Tuesday, October 11, 2011
1 import qualified Text.Parsec as P
  2
  3 main = do
  4 print $ parenthMatch1 "(a (b c))" -- True
  5 print $ parenthMatch1 "(a (b c)" -- False
  6 print $ parenthMatch1 ")(a (b c)" -- False
  7 print $ parenthMatch2 "(a (b c))" -- True
  8 print $ parenthMatch2 "(a (b c)" -- False
  9 print $ parenthMatch2 ")(a (b c)" -- False
 10
 11 parenthMatch1 str = f str 0             1 parenthMatch2 str =
 12 where                                   2 case P.parse (f >> P.eof ) "parenthMatch" str of
 13 f "" 0 = True                           3     Right _ -> True
 14 f "" _ = False                          4     Left _ -> False
 15 f ('(':xs) n = f xs (n + 1)             5 where
 16 f (')':xs) 0 = False                    6 f = P.many (P.noneOf "()" P.<|> g)
 17 f (')':xs) n = f xs (n - 1)             7 g = do
 18 f (_:xs) n = f xs n                     8    P.char '('
                                            9    f
                                           10    P.char ')'

Tuesday, October 11, 2011
Parsec API
               • anyChar
               • char 'a'
               • string "abc"
                     == string ['a', 'b', 'c']
                     == char 'a' >> char 'b' >> char 'c'
               • oneOf ['a', 'b', 'c']
               • noneOf "abc"
               • eof
Tuesday, October 11, 2011
Parsec API (combinator)
               • >>, >>=, return, and fail
               • <|>
               • many p
               • p1 `manyTill` p2
               • p1 `sepBy` p2
               • p1 `chainl` op
Tuesday, October 11, 2011
Parsec API (etc)
               • try
               • lookAhead p
               • notFollowedBy p



Tuesday, October 11, 2011
texts in Haskell



Tuesday, October 11, 2011
three types of text
               • String
               • ByteString
               • Text



Tuesday, October 11, 2011
String
               • [Char]
               • Char: a UTF-8 character
               • "aaa" is String
               • List is lazy and slow


Tuesday, October 11, 2011
ByteString
               • import Data.ByteString
                • Base64
                • Char8
                • UTF8
                • Lazy (Char8, UTF8)
               • Fast. The default of snap
Tuesday, October 11, 2011
ByteString (cont'd)
                       1    {-# LANGUAGE OverloadedStrings #-}
                       2    import Data.ByteString.Char8 ()
                       3    import Data.ByteString (ByteString)
                       4
                       5    main = print ("hello" :: ByteString)


               • OverloadedStrings with Char8
               • Give type expliticly or use with
                     ByteString functions

Tuesday, October 11, 2011
ByteString (cont'd)

       1    import Data.ByteString.UTF8 ()
       2    import qualified Data.ByteString as B
       3    import Codec.Binary.UTF8.String (encode)
       4
       5    main = B.putStrLn (B.pack $ encode "       " :: B.ByteString)




Tuesday, October 11, 2011
Text
               • import Data.Text
               • import Data.Text.IO
               • always UTF8
               • import Data.Text.Lazy
               • Fast

Tuesday, October 11, 2011
Text (cont'd)
                 1      {-# LANGUAGE OverloadedStrings #-}
                 2      import Data.Text (Text)
                 3      import qualified Data.Text.IO as T
                 4
                 5      main = T.putStrLn ("         " :: Text)



               • UTF-8 friendly
Tuesday, October 11, 2011
Parsec supports
               • String
               • ByteString




Tuesday, October 11, 2011
Attoparsec supports
               • ByteString
               • Text




Tuesday, October 11, 2011
Attoparsec
               • cabal install attoparsec
                • attoparsec-text
                • attoparsec-enumerator
                • attoparsec-iteratee
                • attoparsec-text-enumerator

Tuesday, October 11, 2011
Attoparsec pros/cons
               • Pros
                • fast
                • text support
                • enumerator/iteratee
               • Cons
                • no lookAhead/notFollowedBy
Tuesday, October 11, 2011
Parsec and Attoparsec
                                          1   {-# LANGUAGE OverloadedStrings #-}
1   import qualified Text.Parsec as P 2        import qualified Data.Attoparsec.Text as P
2                                         3
3   main = print $ abc "abc"              4   main = print $ abc "abc"
4                                         5
5   abc str = case P.parse f "abc" str of 6   abc str = case P.parseOnly f str of
6             Right _ -> True             7             Right _ -> True
7             Left _ -> False             8             Left _ -> False
8   f = P.string "abc"                    9   f = P.string "abc"




Tuesday, October 11, 2011
return ()



Tuesday, October 11, 2011
Practice
               • args "f(x, g())"
                     -- ["x", "g()"]
               • args "f(, aa(), bb(c))"
                     -- ["", "aa()", "bb(c)"]




Tuesday, October 11, 2011

More Related Content

What's hot

WebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaKatrien Verbert
 
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Puppet
 
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppet
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)Robert Nelson
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and SemanticsTatiana Al-Chueyr
 
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3Matthew Turland
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...SPTechCon
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Collections Framework Begineers guide 2
Collections Framework Begineers guide 2Collections Framework Begineers guide 2
Collections Framework Begineers guide 2Kenji HASUNUMA
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy
 

What's hot (17)

WebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPediaWebTech Tutorial Querying DBPedia
WebTech Tutorial Querying DBPedia
 
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
 
Mentor Your Indexes
Mentor Your IndexesMentor Your Indexes
Mentor Your Indexes
 
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
Enjoying the Journey from Puppet 3.x to Puppet 4.x (PuppetConf 2016)
 
Power of Puppet 4
Power of Puppet 4Power of Puppet 4
Power of Puppet 4
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 
it's just search
it's just searchit's just search
it's just search
 
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Collections Framework Begineers guide 2
Collections Framework Begineers guide 2Collections Framework Begineers guide 2
Collections Framework Begineers guide 2
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
05php
05php05php
05php
 

Similar to Text Manipulation with/without Parsec

Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)OpenBlend society
 
Invertible-syntax 入門
Invertible-syntax 入門Invertible-syntax 入門
Invertible-syntax 入門Hiromi Ishii
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Ramamohan Chokkam
 
Scala in practice - 3 years later
Scala in practice - 3 years laterScala in practice - 3 years later
Scala in practice - 3 years laterpatforna
 
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks
 
Advanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITRAdvanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITRRobert Treat
 
Open course(programming languages) 20150225
Open course(programming languages) 20150225Open course(programming languages) 20150225
Open course(programming languages) 20150225JangChulho
 
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)codin9cafe
 
Python advanced 2. regular expression in python
Python advanced 2. regular expression in pythonPython advanced 2. regular expression in python
Python advanced 2. regular expression in pythonJohn(Qiang) Zhang
 
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...Alexandre Porcelli
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for BioinformaticsJosé Héctor Gálvez
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 

Similar to Text Manipulation with/without Parsec (20)

JavaSE 7
JavaSE 7JavaSE 7
JavaSE 7
 
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
Meet Couch DB
Meet Couch DBMeet Couch DB
Meet Couch DB
 
Invertible-syntax 入門
Invertible-syntax 入門Invertible-syntax 入門
Invertible-syntax 入門
 
Dynamic Python
Dynamic PythonDynamic Python
Dynamic Python
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 
Scala in practice - 3 years later
Scala in practice - 3 years laterScala in practice - 3 years later
Scala in practice - 3 years later
 
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
 
Advanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITRAdvanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITR
 
Open course(programming languages) 20150225
Open course(programming languages) 20150225Open course(programming languages) 20150225
Open course(programming languages) 20150225
 
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
 
22 spam
22 spam22 spam
22 spam
 
Python advanced 2. regular expression in python
Python advanced 2. regular expression in pythonPython advanced 2. regular expression in python
Python advanced 2. regular expression in python
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
 
2015 555 kharchenko_ppt
2015 555 kharchenko_ppt2015 555 kharchenko_ppt
2015 555 kharchenko_ppt
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for Bioinformatics
 
Datastruct2
Datastruct2Datastruct2
Datastruct2
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 

More from ujihisa

vimconf2013
vimconf2013vimconf2013
vimconf2013ujihisa
 
KOF2013 Minecraft / Clojure
KOF2013 Minecraft / ClojureKOF2013 Minecraft / Clojure
KOF2013 Minecraft / Clojureujihisa
 
Keynote ujihisa.vim#2
Keynote ujihisa.vim#2Keynote ujihisa.vim#2
Keynote ujihisa.vim#2ujihisa
 
vimshell made other shells legacy
vimshell made other shells legacyvimshell made other shells legacy
vimshell made other shells legacyujihisa
 
From Ruby to Haskell (Kansai Yami RubyKaigi)
From Ruby to Haskell (Kansai Yami RubyKaigi)From Ruby to Haskell (Kansai Yami RubyKaigi)
From Ruby to Haskell (Kansai Yami RubyKaigi)ujihisa
 
CoffeeScript in hootsuite
CoffeeScript in hootsuiteCoffeeScript in hootsuite
CoffeeScript in hootsuiteujihisa
 
HootSuite Dev 2
HootSuite Dev 2HootSuite Dev 2
HootSuite Dev 2ujihisa
 
Ruby Kansai49
Ruby Kansai49Ruby Kansai49
Ruby Kansai49ujihisa
 
Hootsuite dev 2011
Hootsuite dev 2011Hootsuite dev 2011
Hootsuite dev 2011ujihisa
 
LLVM Workshop Osaka Umeda, Japan
LLVM Workshop Osaka Umeda, JapanLLVM Workshop Osaka Umeda, Japan
LLVM Workshop Osaka Umeda, Japanujihisa
 
RubyConf 2009 LT "Termtter"
RubyConf 2009 LT "Termtter"RubyConf 2009 LT "Termtter"
RubyConf 2009 LT "Termtter"ujihisa
 
Hacking parse.y (RubyConf 2009)
Hacking parse.y (RubyConf 2009)Hacking parse.y (RubyConf 2009)
Hacking parse.y (RubyConf 2009)ujihisa
 
Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)ujihisa
 
Hacking Parse.y with ujihisa
Hacking Parse.y with ujihisaHacking Parse.y with ujihisa
Hacking Parse.y with ujihisaujihisa
 
Ruby Kansai #35 About RubyKaigi2009 ujihisa
Ruby Kansai #35 About RubyKaigi2009 ujihisaRuby Kansai #35 About RubyKaigi2009 ujihisa
Ruby Kansai #35 About RubyKaigi2009 ujihisaujihisa
 
Kof2008 Itll
Kof2008 ItllKof2008 Itll
Kof2008 Itllujihisa
 
All About Metarw -- VimM#2
All About Metarw -- VimM#2All About Metarw -- VimM#2
All About Metarw -- VimM#2ujihisa
 
Itc2008 Ujihisa
Itc2008 UjihisaItc2008 Ujihisa
Itc2008 Ujihisaujihisa
 
Agile Web Posting With Ruby / Ruby Kaigi2008
Agile Web Posting With Ruby / Ruby Kaigi2008Agile Web Posting With Ruby / Ruby Kaigi2008
Agile Web Posting With Ruby / Ruby Kaigi2008ujihisa
 
Agile Web Posting with Ruby (lang:ja)
Agile Web Posting with Ruby (lang:ja)Agile Web Posting with Ruby (lang:ja)
Agile Web Posting with Ruby (lang:ja)ujihisa
 

More from ujihisa (20)

vimconf2013
vimconf2013vimconf2013
vimconf2013
 
KOF2013 Minecraft / Clojure
KOF2013 Minecraft / ClojureKOF2013 Minecraft / Clojure
KOF2013 Minecraft / Clojure
 
Keynote ujihisa.vim#2
Keynote ujihisa.vim#2Keynote ujihisa.vim#2
Keynote ujihisa.vim#2
 
vimshell made other shells legacy
vimshell made other shells legacyvimshell made other shells legacy
vimshell made other shells legacy
 
From Ruby to Haskell (Kansai Yami RubyKaigi)
From Ruby to Haskell (Kansai Yami RubyKaigi)From Ruby to Haskell (Kansai Yami RubyKaigi)
From Ruby to Haskell (Kansai Yami RubyKaigi)
 
CoffeeScript in hootsuite
CoffeeScript in hootsuiteCoffeeScript in hootsuite
CoffeeScript in hootsuite
 
HootSuite Dev 2
HootSuite Dev 2HootSuite Dev 2
HootSuite Dev 2
 
Ruby Kansai49
Ruby Kansai49Ruby Kansai49
Ruby Kansai49
 
Hootsuite dev 2011
Hootsuite dev 2011Hootsuite dev 2011
Hootsuite dev 2011
 
LLVM Workshop Osaka Umeda, Japan
LLVM Workshop Osaka Umeda, JapanLLVM Workshop Osaka Umeda, Japan
LLVM Workshop Osaka Umeda, Japan
 
RubyConf 2009 LT "Termtter"
RubyConf 2009 LT "Termtter"RubyConf 2009 LT "Termtter"
RubyConf 2009 LT "Termtter"
 
Hacking parse.y (RubyConf 2009)
Hacking parse.y (RubyConf 2009)Hacking parse.y (RubyConf 2009)
Hacking parse.y (RubyConf 2009)
 
Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)
 
Hacking Parse.y with ujihisa
Hacking Parse.y with ujihisaHacking Parse.y with ujihisa
Hacking Parse.y with ujihisa
 
Ruby Kansai #35 About RubyKaigi2009 ujihisa
Ruby Kansai #35 About RubyKaigi2009 ujihisaRuby Kansai #35 About RubyKaigi2009 ujihisa
Ruby Kansai #35 About RubyKaigi2009 ujihisa
 
Kof2008 Itll
Kof2008 ItllKof2008 Itll
Kof2008 Itll
 
All About Metarw -- VimM#2
All About Metarw -- VimM#2All About Metarw -- VimM#2
All About Metarw -- VimM#2
 
Itc2008 Ujihisa
Itc2008 UjihisaItc2008 Ujihisa
Itc2008 Ujihisa
 
Agile Web Posting With Ruby / Ruby Kaigi2008
Agile Web Posting With Ruby / Ruby Kaigi2008Agile Web Posting With Ruby / Ruby Kaigi2008
Agile Web Posting With Ruby / Ruby Kaigi2008
 
Agile Web Posting with Ruby (lang:ja)
Agile Web Posting with Ruby (lang:ja)Agile Web Posting with Ruby (lang:ja)
Agile Web Posting with Ruby (lang:ja)
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Text Manipulation with/without Parsec

  • 1. Text manipulation with/without parsec October 11, 2011 Vancouver Haskell UnMeetup Tatsuhiro Ujihisa Tuesday, October 11, 2011
  • 2. • Tatsuhiro Ujihisa • @ujm • HootSuite Media inc • Osaka, Japan • Vim: 14 • Haskell: 5 Tuesday, October 11, 2011
  • 3. Topics • text manipulation functions with/ without parsec • parsec library • texts in Haskell • attoparsec library Tuesday, October 11, 2011
  • 4. Haskell for work • Something academical • Something methematical • Web app • Better shell scripting • (Improve yourself ) Tuesday, October 11, 2011
  • 5. Text manipulation • The concept of text • String is [Char] • lazy • Pattern matching Tuesday, October 11, 2011
  • 6. Example: split • Ruby/Python example • 'aaa<>bb<>c<><>d'.split('<>') ['aaa', 'bb', 'c', '', 'd'] • Vim script example • split('aaa<>bb<>c<><>d', '<>') Tuesday, October 11, 2011
  • 7. split in Haskell • split :: String -> String -> [String] • split "aaa<>bb<>c<><>d" "<>" ["aaa", "bb", "c", "", "d"] • "aaa<>bb<>c<><>d" `split` "<>" Tuesday, October 11, 2011
  • 8. Design of split • split "aaa<>bb<>c<><>d" "<>" • "aaa" : split "bb<>c<><>d" "<>" • "aaa" : "bb" : split "c<><>d" "<>" • "aaa" : "bb" : "c" : split "<>d" "<>" • "aaa" : "bb" : "c" : "" : split "d" "<>" • "aaa" : "bb" : "c" : "" : "d" split "" "<>" • "aaa" : "bb" : "c" : "" : "d" : [] Tuesday, October 11, 2011
  • 9. Design of split • split "aaa<>bb<>c<><>d" "<>" • "aaa" : split "bb<>c<><>d" "<>" Tuesday, October 11, 2011
  • 10. Design of split • split "aaa<>bb<>c<><>d" "<>" • split' "aaa<>bb<>c<><>d" "" "<>" • split' "aa<>bb<>c<><>d" "a" "<>" • split' "a<>bb<>c<><>d" "aa" "<>" • split' "<>bb<>c<><>d" "aaa" "<>" • "aaa" : split "bb<>c<><>d" "<>" Tuesday, October 11, 2011
  • 11. split "aaa<>bb<>c<><>d" "<>" • split' "aaa<>bb<>c<><>d" "" "<>" • split' "aa<>bb<>c<><>d" "a" "<>" • split' "a<>bb<>c<><>d" "aa" "<>" 1 split :: String -> String -> [String] • split' "<>bb<>c<><>d" "aaa" "<>" 2 3 str `split` pat = split' str pat "" • "aaa" : split "bb<>c<><>d" "<>" 4 split' :: String -> String -> String -> [String] 5 split' "" _ memo = [reverse memo] 6 split' str pat memo = let (a, b) = splitAt (length pat) str in 7 ______________________if a == pat 8 _________________________then (reverse memo) : (b `split` pat) 9 _________________________else split' (tail str) pat (head str : memo) Tuesday, October 11, 2011
  • 12. Another approach • Text.Parsec: v3 • Text.ParserCombinators.Parsec: v2 • Real World Haskell Parsec chapter • csv parser Tuesday, October 11, 2011
  • 13. Design of split • split "aaa<>bb<>c<><>d" "<>" • many of • any char except for the string of "<>" • that separated by "<>" or the end of string Tuesday, October 11, 2011
  • 14. 1 import qualified Text.Parsec as P 2 3 str `split` pat = case P.parse (split' (P.string pat)) "split" str of 4 _______________________Right x -> x 5 split' pat = P.anyChar `P.manyTill` (P.eof P.<|> (P.try (P.lookAhead pat) >> return ())) `P.sepBy` pat Tuesday, October 11, 2011
  • 15. 1 import qualified Text.Parsec as P 2 3 str `split` pat = case P.parse (split' (P.string pat)) "split" str of 4 _______________________Right x -> x 5 split' pat = P.anyChar `P.manyTill` (P.eof P.<|> (P.try (P.lookAhead pat) >> return ())) `P.sepBy` pat Any char Except for end of the string or the pattern to separate (without consuming text) Tuesday, October 11, 2011
  • 16. 1 import qualified Text.Parsec as P 2 3 main = do 4 print $ abc1 "abc" -- True 5 print $ abc1 "abcd" -- False 6 print $ abc2 "abc" -- True 7 print $ abc2 "abcd" -- False 8 9 abc1 str = str == "abc" 10 abc2 str = case P.parse (P.string "abc" >> P.eof ) "abc" str of 11 Right _ -> True 12 Left _ -> False Tuesday, October 11, 2011
  • 17. 1 import qualified Text.Parsec as P 2 3 main = do 4 print $ parenthMatch1 "(a (b c))" -- True 5 print $ parenthMatch1 "(a (b c)" -- False 6 print $ parenthMatch1 ")(a (b c)" -- False 7 print $ parenthMatch2 "(a (b c))" -- True 8 print $ parenthMatch2 "(a (b c)" -- False 9 print $ parenthMatch2 ")(a (b c)" -- False 10 11 parenthMatch1 str = f str 0 1 parenthMatch2 str = 12 where 2 case P.parse (f >> P.eof ) "parenthMatch" str of 13 f "" 0 = True 3 Right _ -> True 14 f "" _ = False 4 Left _ -> False 15 f ('(':xs) n = f xs (n + 1) 5 where 16 f (')':xs) 0 = False 6 f = P.many (P.noneOf "()" P.<|> g) 17 f (')':xs) n = f xs (n - 1) 7 g = do 18 f (_:xs) n = f xs n 8 P.char '(' 9 f 10 P.char ')' Tuesday, October 11, 2011
  • 18. Parsec API • anyChar • char 'a' • string "abc" == string ['a', 'b', 'c'] == char 'a' >> char 'b' >> char 'c' • oneOf ['a', 'b', 'c'] • noneOf "abc" • eof Tuesday, October 11, 2011
  • 19. Parsec API (combinator) • >>, >>=, return, and fail • <|> • many p • p1 `manyTill` p2 • p1 `sepBy` p2 • p1 `chainl` op Tuesday, October 11, 2011
  • 20. Parsec API (etc) • try • lookAhead p • notFollowedBy p Tuesday, October 11, 2011
  • 21. texts in Haskell Tuesday, October 11, 2011
  • 22. three types of text • String • ByteString • Text Tuesday, October 11, 2011
  • 23. String • [Char] • Char: a UTF-8 character • "aaa" is String • List is lazy and slow Tuesday, October 11, 2011
  • 24. ByteString • import Data.ByteString • Base64 • Char8 • UTF8 • Lazy (Char8, UTF8) • Fast. The default of snap Tuesday, October 11, 2011
  • 25. ByteString (cont'd) 1 {-# LANGUAGE OverloadedStrings #-} 2 import Data.ByteString.Char8 () 3 import Data.ByteString (ByteString) 4 5 main = print ("hello" :: ByteString) • OverloadedStrings with Char8 • Give type expliticly or use with ByteString functions Tuesday, October 11, 2011
  • 26. ByteString (cont'd) 1 import Data.ByteString.UTF8 () 2 import qualified Data.ByteString as B 3 import Codec.Binary.UTF8.String (encode) 4 5 main = B.putStrLn (B.pack $ encode " " :: B.ByteString) Tuesday, October 11, 2011
  • 27. Text • import Data.Text • import Data.Text.IO • always UTF8 • import Data.Text.Lazy • Fast Tuesday, October 11, 2011
  • 28. Text (cont'd) 1 {-# LANGUAGE OverloadedStrings #-} 2 import Data.Text (Text) 3 import qualified Data.Text.IO as T 4 5 main = T.putStrLn (" " :: Text) • UTF-8 friendly Tuesday, October 11, 2011
  • 29. Parsec supports • String • ByteString Tuesday, October 11, 2011
  • 30. Attoparsec supports • ByteString • Text Tuesday, October 11, 2011
  • 31. Attoparsec • cabal install attoparsec • attoparsec-text • attoparsec-enumerator • attoparsec-iteratee • attoparsec-text-enumerator Tuesday, October 11, 2011
  • 32. Attoparsec pros/cons • Pros • fast • text support • enumerator/iteratee • Cons • no lookAhead/notFollowedBy Tuesday, October 11, 2011
  • 33. Parsec and Attoparsec 1 {-# LANGUAGE OverloadedStrings #-} 1 import qualified Text.Parsec as P 2 import qualified Data.Attoparsec.Text as P 2 3 3 main = print $ abc "abc" 4 main = print $ abc "abc" 4 5 5 abc str = case P.parse f "abc" str of 6 abc str = case P.parseOnly f str of 6 Right _ -> True 7 Right _ -> True 7 Left _ -> False 8 Left _ -> False 8 f = P.string "abc" 9 f = P.string "abc" Tuesday, October 11, 2011
  • 35. Practice • args "f(x, g())" -- ["x", "g()"] • args "f(, aa(), bb(c))" -- ["", "aa()", "bb(c)"] Tuesday, October 11, 2011