SlideShare a Scribd company logo
1 of 24
Download to read offline
How to parse ‘go’
Natural Language Processing in Ruby
Tom Cartwright
@tomcartwrightuk
!

keepmebooked
giveaiddirect.com
Python, surely?
Yes. The NLTK is awesome.
But you have a Ruby-based app.
Extracting meaning from !
human input
Summarisation
Extracting entities
Tagging text
Sentiment analysis
Filtering text
document

sentence

From document level!
!
!
!
!

word

example

to word level
document

sentence

word

example

Chunking & segmenting
Breaking text into paragraphs, sentences and other zones
Start with a document/some text:
“The second nonabsolute number is the given time of
arrival, which is now known to be one of those most bizarre
of mathematical concepts, a recipriversexclusion, a number
whose existence can only be defined as being anything other
than itself…..”
document

sentence

word

Punkt sentence tokenizer to the rescue….

example
document

sentence

word

example

tokenizer = Punkt::SentenceTokenizer.new(!
"The second nonabsolute number is the given time
of arrival...")!
!

result = !
tokenizer.sentences_from_text(text,!
:output => :sentences_text)!
!
!
!
document

sentence

word

example

Training

trainer = Punkt::Trainer.new()!
trainer.train(bistromatic_text)
document

sentence

word

example

Tokenising
Breaking text into words, phrases and symbols.
“Time is an illusion. Lunchtime
doubly so.”.split(“ “)!
!

#=> !
!

[“Time", “is", “an", “illusion.”,
“Lunchtime", “doubly", “so.”]!
document

sentence

word

example

Tokenizer gem
Regexes and rules
class Tokenizer	
	
FS = Regexp.new(‘[[:blank:]]+')	
PAIR_PRE = ['(', '{', '[']	
SIMPLE_POST = ['!', '?', ',', ':', ';', '.']	
PAIR_POST = [')', '}', ']']	
PRE_N_POST = ['"', “'"]	
…
document

sentence

word

tokenizer = Tokenizer::Tokenizer.new
tokenizer.tokenize(“Time is an
illusion. Lunchtime doubly so.”)

#=>

[“Time", “is", “an", “illusion", “.”,
“Lunchtime", “doubly", “so", “.”]

example
document

sentence

word

example

Stemming
Jogging => Jog
“jogging”.gsub(/.ing/, “”) !
#=> “jog"!
!

“bring”.gsub(/.ing/, “”) !
#=> “b"
document

sentence

1. Ruby-Stemmer
2. Text

word

example

multi-language porter stemmer

porter stemmer

stemmer = Lingua::Stemmer.new(:language => "en")
stemmer.stem("programming") #=> program
stemmer.stem("vimming") #=> vim
document

sentence

word

example

Parts-of-speech tagging
CC

conjunction

DET

determiner

and, but
this, some

IN

preposition / conjunction

JJ

adjective

NNP

above, about

orange, tiny

proper noun

Camden Pale Ale
document

sentence

word

A couple of methods!
!

Regex tagger
/*.ing/
VBG
/*.ed/

VBD
!

Lookup on words
E.g.
calculating : { VBG: 6 }
orange: { JJ: 2, NN: 5 }

example
document

sentence

word

example

A tale of two taggers
EngTagger

rb-brill-tagger

Probabilistic (uses

•

Rule based

look up table prev.

•

•

C extensions

slide)
•

Brown corpus trained

•

Pure ruby
document

sentence

word

example

Treat gem
Bundles many of the gems shown
Wraps them in a DSL
s = sentence(“A really good sentence.”)
s.do(:chunk, :segment, :tokenize, :parse)

stemming; tokenising; chunking; serialising;
tagging; text extraction from pdfs and html;
LRUG Sentiments
A tag

{NN}

Pass in regex => /({JJ}|{JJS})({NNS}|{NNP})/
And some tagged tokens
#=> [(Word @tag="JJ", @text="jolly"),!
(Word @tag="NN", @text="face")]
Sentimental value
1.0
!
1.0
0.21875
0.21875
-1.0
-1.0

epic!
good!
chance!
brisk!
slanderous!
piteous
Results
!
!
!
•
•

•
•
•

Ruby!
Practical ObjectOriented Design in
Ruby!
Doctors!
Lrug!
recruiters (!)

•
•
•

dedicated servers!
pdfs!
Surrey

•

•
•
•
•

unsolicited phone
calls from
r********s!
clients!
Paypal!
XML!
geeks
Gems
Text - Paul Battley’s box of tricks
Treat
Tokenizer
Punkt segmenter
Chronic - for extracting dates
Other things you can do/I didn’t talk about
Calculate text edit distance
Extract entities using the Stanford
libraries via the RJB
!

Extract topic words (LDA)
!

Keyword extraction - TfIdf
!

Jruby
Thank you for processing.
Questions?
@tomcartwrightuk

Thanks to Tim Cowlishaw and the HT dev
team for specialised rubber duck support

More Related Content

What's hot

Ruby Introduction
Ruby IntroductionRuby Introduction
Ruby Introduction
Prabu D
 
Programming languages vienna
Programming languages viennaProgramming languages vienna
Programming languages vienna
greg_s
 

What's hot (17)

Ruby Introduction
Ruby IntroductionRuby Introduction
Ruby Introduction
 
Etymology Markup in TEI XML
Etymology Markup in TEI XMLEtymology Markup in TEI XML
Etymology Markup in TEI XML
 
Ruby Hell Yeah
Ruby Hell YeahRuby Hell Yeah
Ruby Hell Yeah
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
Week2
Week2Week2
Week2
 
Semana Interop: Trabalhando com IronPython e com Ironruby
Semana Interop: Trabalhando com IronPython e com IronrubySemana Interop: Trabalhando com IronPython e com Ironruby
Semana Interop: Trabalhando com IronPython e com Ironruby
 
Programming languages vienna
Programming languages viennaProgramming languages vienna
Programming languages vienna
 
Ruby monsters
Ruby monstersRuby monsters
Ruby monsters
 
Python2 unicode-pt1
Python2 unicode-pt1Python2 unicode-pt1
Python2 unicode-pt1
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Kotlin L → ∞
Kotlin L → ∞Kotlin L → ∞
Kotlin L → ∞
 
Ruby
RubyRuby
Ruby
 
Go programing language
Go programing languageGo programing language
Go programing language
 
Intro to NLP. Lecture 2
Intro to NLP.  Lecture 2Intro to NLP.  Lecture 2
Intro to NLP. Lecture 2
 
Ruby Presentation
Ruby Presentation Ruby Presentation
Ruby Presentation
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
 

Viewers also liked

Viewers also liked (13)

Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
Google guava - almost everything you need to know
Google guava - almost everything you need to knowGoogle guava - almost everything you need to know
Google guava - almost everything you need to know
 
Patient matching in FHIR
Patient matching in FHIRPatient matching in FHIR
Patient matching in FHIR
 
Procesamiento de Lenguaje Natural, Python y NLTK
Procesamiento de Lenguaje Natural, Python y NLTKProcesamiento de Lenguaje Natural, Python y NLTK
Procesamiento de Lenguaje Natural, Python y NLTK
 
Evolution of Software Engineering in NCTR Projects
Evolution of Software Engineering in NCTR  Projects   Evolution of Software Engineering in NCTR  Projects
Evolution of Software Engineering in NCTR Projects
 
Codeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansaiCodeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansai
 
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
A Doctor’s Perspective on the Future Role of Pharmaceutical-Doctor Relationsh...
 
Google guava overview
Google guava overviewGoogle guava overview
Google guava overview
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Building a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and LuceneBuilding a distributed search system with Hadoop and Lucene
Building a distributed search system with Hadoop and Lucene
 

Similar to Natural Language Processing in Ruby

Javascriptbootcamp
JavascriptbootcampJavascriptbootcamp
Javascriptbootcamp
oscon2007
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
Gopi Krishnan Nambiar
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
oscon2007
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
Ramamohan Chokkam
 
Ruby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic IntroductionRuby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic Introduction
Prabu D
 

Similar to Natural Language Processing in Ruby (20)

TechDays - IronRuby
TechDays - IronRubyTechDays - IronRuby
TechDays - IronRuby
 
Javascriptbootcamp
JavascriptbootcampJavascriptbootcamp
Javascriptbootcamp
 
Modern C++
Modern C++Modern C++
Modern C++
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02Jsonsaga 100605143125-phpapp02
Jsonsaga 100605143125-phpapp02
 
CL-NLP
CL-NLPCL-NLP
CL-NLP
 
Embed--Basic PERL XS
Embed--Basic PERL XSEmbed--Basic PERL XS
Embed--Basic PERL XS
 
The Holistic Programmer
The Holistic ProgrammerThe Holistic Programmer
The Holistic Programmer
 
Streams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetupStreams of information - Chicago crystal language monthly meetup
Streams of information - Chicago crystal language monthly meetup
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Ruby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic IntroductionRuby 1.9.3 Basic Introduction
Ruby 1.9.3 Basic Introduction
 
Words in Code
Words in CodeWords in Code
Words in Code
 
Go language presentation
Go language presentationGo language presentation
Go language presentation
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Beyond the Style Guides
Beyond the Style GuidesBeyond the Style Guides
Beyond the Style Guides
 
Build a compiler using C#, Irony and RunSharp.
Build a compiler using C#, Irony and RunSharp.Build a compiler using C#, Irony and RunSharp.
Build a compiler using C#, Irony and RunSharp.
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Natural Language Processing in Ruby