SlideShare a Scribd company logo
1 of 21
Download to read offline
Natural Language Processing
An Introduction
Ashwin Ittoo
About Myself – Ashwin Ittoo
Associate Professor HEC Liège, ULiège
Research Associate, JAIST (Japan)
Associate Editor, Elsevier (Computers in Industry)
• 3 PhD , ULiège, Belgium
• Finance
• Marketing
• Medicine
• 1 PhD , JAIST Japan (Aug. 2018)
3
Team
• Natural Language Processing (NLP)
• Traitement automatique de langues naturelles (TAL)
• Methods for “analysing” language
• Expressed in written form, text data
• Text data common in NLP
• Tweets
• Amazon/Yelp reviews
• Wikipedia
• Domain-specific articles (finance, medicine, …)
4
Introduction
• Variety of Analysis
• Document classification, e.g.
• Sentiment analysis
• Information extraction, e.g.
• Extracting facts from legal texts
• Machine translation
• Methods Evolution
• From formal logics, linguistics
• To machine learning, deep learning
5
Introduction (cont)
• Distinction in methods
• Pipeline organization
6
Methods & Pipeline
Pre-processing
Feature Engineering
Document Classification
Sentiment Analysis
Machine Translation
Text
collection
Low-level NLP Tasks
High-level NLP Tasks
• Clean the data
• Removing stopwords (“a”, “the”,….)
• Removing non-ASCII characters
• Straightforward
• No learning (machine/deep) involved
8
Low-Level: Pre-processing
Pre-processing
Feature Engineering
• Text  Number transformation
• Individual tokens from sentence
• Tokens: words, numbers, punctuations…
• Tokens = features
• How to best represent features?
9
Low-Level: Feature Engineering
Pre-processing
Feature Engineering
• As-is
• Each token = 1 feature
• Eat, ate, eaten: 3 tokens, 3 distinct features
• Huge number of features
• Curse of dimensionality
• Morphology
• Replace token with lemma (root)
• Eat, ate, eaten  eat: 3 tokens, 1 feature
• Demo
10
Feature Representation
• Grammatical Information
• Use Part-of-Speech (POS)/POS-tagging
• Defined in Penn Tree Bank (UPenn)
• E.g. 2 nice movies  CD JJ NNS
• Several tools for POS-tagging
• Stanford NLP (Java)
• Scikitlearn/NLTK (Python)
• Demo
11
Feature Representation (cont)
• Application of machine learning for NLP
• Large number of classes (each POS-tag)
• Temporal sequence of word occurrence
• Hidden Markov Model
• 𝑡 𝑛 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑡 𝑛 𝑃 𝑡 𝑛 𝑤 𝑛
≈ 𝑎𝑟𝑔𝑚𝑎𝑥 𝑡 𝑛 𝑃 𝑤𝑖 𝑡𝑖
𝑛
𝑖=1 𝑃 𝑡𝑖 𝑡𝑖−1
• 𝑃 𝑤𝑖 𝑡𝑖 : prob. pos-tag 𝑡𝑖 given word 𝑤𝑖
• 𝑡𝑖 𝑡𝑖−1 : prob. pos-tag 𝑡𝑖−1 given pos-tag 𝑡𝑖
12
Part-of-Speech Tagging
• How to select best features?
• Intuitively: some words are more important than others
• E.g. “doping”  sports documents
• Tf-Idf
• Term frequency-Inverse document frequency
• Standard statistical tests
• Chi-square
• Mutual Information
• Demo
13
Low-Level: Feature Engineering
• High-level tasks
• Features (low-level task) as input
• Sentiment Analysis
• Determine sentiment in customer reviews
• E.g. movie reviews, Amazon product reviews
• Classification Problem
• 2 (3) classes/categories
• +, - (neutral)
• Supervised Learning
• Movie reviews, annotated with sentiment class, available
• Train classification algorithm
• Naïve-Bayes, SVM, Random Forests, Neural Networks
14
High-Level: Sentiment Analysis
Sentiment Analysis
Machine Translation
…
Low-level NLP Tasks
High-level NLP Tasks
Features
• Confusion matrix
• True positive, false negative
• True negative, false positive
• Precision
• Fraction of reviews correctly classified
• How precise our model is?
• Recall
• Fraction of correct reviews (from gold standard set) correctly classified
• What is the coverage of the model
• F1-score
• Balances precision, recall
15
High-Level: Evaluation Metrics
• Feature Engineering
• Core of machine learning, NLP but…
• Manual, time-consuming
• Bottleneck in machine learning, NLP
• Deep Learning
• Neural network with many hidden layers
• Supervised Learning Approach
• Trained on annotated data
• Movie reviews with sentiment class
• Input: word (vectors) from reviews
• Output: class label (+,-, neutral)
• Hidden layers learn feature representation
• No (minimum) feature engineering
16
Deep Learning in NLP
• Different Deep Learning Architectures
• E.g. CNN for image processing
• RNN (Recurrent Neural Network)
• State of the art for text
• Considers temporal nature of tokens in sentence
17
Deep Learning in NLP (cont)
18
RNN for Sentiment Analysis
• Sentiment Challenge
• Each clause can express a different sentiment
• Need to keep track of word sequences
• Need to compose individual sentiments for overall sentiment
- This movie doesn't care about cleverness, wit or any other kind of
intelligent humor.
-Those who find ugly meanings in beautiful things are corrupt without
being charming.
19
Language Processing/Sentiment Analysis (cont)
• Trained over sentiment treebank
• Phrases, clauses, sentences, e.g. “This isn’ a new idea”
• Annotated with respective sentiments (blue: +, red: -)
Java Demo (Stanford Libraries)
20
Unsupervised Learning/Word Embeddings
• Neural language models/word embeddings
• Word2Vec (shallow neural network, not deep learning)
• Predict context given centre word (skip gram)
• E.g. given “bankrupt”, predict “the bank went bankrupt last year”
• Words/contexts from Google news
21
Towards Unsupervised Learning (cont)
• Word vectors representation capture semantic properties
• Word meaning and geometry
• King – queen – man = woman
22
THE END
Thank you for your attention

More Related Content

What's hot

TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...GeekPwn Keen
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAhmed Magdy Ezzeldin, MSc.
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processingBalayogi G
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Ahmed Magdy Ezzeldin, MSc.
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringAhmed Magdy Ezzeldin, MSc.
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyRimzim Thube
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoSebastian Ruder
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 

What's hot (20)

L1
L1L1
L1
 
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Notesparadigms
NotesparadigmsNotesparadigms
Notesparadigms
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 

Similar to Natural Language Processing

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Big data 4 webmonday
Big data 4 webmondayBig data 4 webmonday
Big data 4 webmondayDaniel Koller
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisSagar Ahire
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Machine Learning Prague
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1Taymoor Nazmy
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Nikola Milosevic
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash CourseCharlie Greenbacker
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learningayatan2
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 

Similar to Natural Language Processing (20)

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Big data 4 webmonday
Big data 4 webmondayBig data 4 webmonday
Big data 4 webmonday
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1
 
Oop is not Dead
Oop is not DeadOop is not Dead
Oop is not Dead
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash Course
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learning
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Nltk
NltkNltk
Nltk
 

More from Geeks Anonymes

Programmer sous Unreal Engine
Programmer sous Unreal EngineProgrammer sous Unreal Engine
Programmer sous Unreal EngineGeeks Anonymes
 
Implémentation efficace et durable de processus métiers complexes
Implémentation efficace et durable de processus métiers complexesImplémentation efficace et durable de processus métiers complexes
Implémentation efficace et durable de processus métiers complexesGeeks Anonymes
 
Managing Open Source Licenses (Geeks Anonymes)
Managing Open Source Licenses (Geeks Anonymes)Managing Open Source Licenses (Geeks Anonymes)
Managing Open Source Licenses (Geeks Anonymes)Geeks Anonymes
 
Reprendre le contrôle de ses données
Reprendre le contrôle de ses donnéesReprendre le contrôle de ses données
Reprendre le contrôle de ses donnéesGeeks Anonymes
 
Geeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes
 
Le rôle du testeur et le Blackbox testing
Le rôle du testeur et le Blackbox testingLe rôle du testeur et le Blackbox testing
Le rôle du testeur et le Blackbox testingGeeks Anonymes
 
Vulnérabilités au cœur des applications Web, menaces et contre-mesures
 Vulnérabilités au cœur des applications Web, menaces et contre-mesures Vulnérabilités au cœur des applications Web, menaces et contre-mesures
Vulnérabilités au cœur des applications Web, menaces et contre-mesuresGeeks Anonymes
 
191121 philippe teuwen cryptographie et attaques materielles
191121 philippe teuwen cryptographie et attaques materielles191121 philippe teuwen cryptographie et attaques materielles
191121 philippe teuwen cryptographie et attaques materiellesGeeks Anonymes
 
"Surfez couverts !" - Conseils de Cyber securité
"Surfez couverts !" - Conseils de Cyber securité "Surfez couverts !" - Conseils de Cyber securité
"Surfez couverts !" - Conseils de Cyber securité Geeks Anonymes
 
Introduction au développement mobile - développer une application iOS et Andr...
Introduction au développement mobile - développer une application iOS et Andr...Introduction au développement mobile - développer une application iOS et Andr...
Introduction au développement mobile - développer une application iOS et Andr...Geeks Anonymes
 
Intelligence artificielle et propriété intellectuelle
Intelligence artificielle et propriété intellectuelleIntelligence artificielle et propriété intellectuelle
Intelligence artificielle et propriété intellectuelleGeeks Anonymes
 
Pour une histoire plophonique du jeu video
Pour une histoire plophonique du jeu videoPour une histoire plophonique du jeu video
Pour une histoire plophonique du jeu videoGeeks Anonymes
 
Become Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceBecome Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceGeeks Anonymes
 
Reconnaissance vocale et création artistique
Reconnaissance vocale et création artistiqueReconnaissance vocale et création artistique
Reconnaissance vocale et création artistiqueGeeks Anonymes
 
Sécurité, GDPR : vos données ont de la valeur
Sécurité, GDPR : vos données ont de la valeur Sécurité, GDPR : vos données ont de la valeur
Sécurité, GDPR : vos données ont de la valeur Geeks Anonymes
 

More from Geeks Anonymes (20)

Programmer sous Unreal Engine
Programmer sous Unreal EngineProgrammer sous Unreal Engine
Programmer sous Unreal Engine
 
Implémentation efficace et durable de processus métiers complexes
Implémentation efficace et durable de processus métiers complexesImplémentation efficace et durable de processus métiers complexes
Implémentation efficace et durable de processus métiers complexes
 
Managing Open Source Licenses (Geeks Anonymes)
Managing Open Source Licenses (Geeks Anonymes)Managing Open Source Licenses (Geeks Anonymes)
Managing Open Source Licenses (Geeks Anonymes)
 
Reprendre le contrôle de ses données
Reprendre le contrôle de ses donnéesReprendre le contrôle de ses données
Reprendre le contrôle de ses données
 
Geeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes - Le langage Go
Geeks Anonymes - Le langage Go
 
Le rôle du testeur et le Blackbox testing
Le rôle du testeur et le Blackbox testingLe rôle du testeur et le Blackbox testing
Le rôle du testeur et le Blackbox testing
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Vulnérabilités au cœur des applications Web, menaces et contre-mesures
 Vulnérabilités au cœur des applications Web, menaces et contre-mesures Vulnérabilités au cœur des applications Web, menaces et contre-mesures
Vulnérabilités au cœur des applications Web, menaces et contre-mesures
 
191121 philippe teuwen cryptographie et attaques materielles
191121 philippe teuwen cryptographie et attaques materielles191121 philippe teuwen cryptographie et attaques materielles
191121 philippe teuwen cryptographie et attaques materielles
 
"Surfez couverts !" - Conseils de Cyber securité
"Surfez couverts !" - Conseils de Cyber securité "Surfez couverts !" - Conseils de Cyber securité
"Surfez couverts !" - Conseils de Cyber securité
 
Introduction au développement mobile - développer une application iOS et Andr...
Introduction au développement mobile - développer une application iOS et Andr...Introduction au développement mobile - développer une application iOS et Andr...
Introduction au développement mobile - développer une application iOS et Andr...
 
Le langage rust
Le langage rustLe langage rust
Le langage rust
 
Test your code
Test your codeTest your code
Test your code
 
Intelligence artificielle et propriété intellectuelle
Intelligence artificielle et propriété intellectuelleIntelligence artificielle et propriété intellectuelle
Intelligence artificielle et propriété intellectuelle
 
Pour une histoire plophonique du jeu video
Pour une histoire plophonique du jeu videoPour une histoire plophonique du jeu video
Pour une histoire plophonique du jeu video
 
Become Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceBecome Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open Source
 
Reconnaissance vocale et création artistique
Reconnaissance vocale et création artistiqueReconnaissance vocale et création artistique
Reconnaissance vocale et création artistique
 
Sécurité, GDPR : vos données ont de la valeur
Sécurité, GDPR : vos données ont de la valeur Sécurité, GDPR : vos données ont de la valeur
Sécurité, GDPR : vos données ont de la valeur
 
Modern sql
Modern sqlModern sql
Modern sql
 
Qt
QtQt
Qt
 

Recently uploaded

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 

Recently uploaded (20)

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 

Natural Language Processing

  • 1. Natural Language Processing An Introduction Ashwin Ittoo
  • 2. About Myself – Ashwin Ittoo Associate Professor HEC Liège, ULiège Research Associate, JAIST (Japan) Associate Editor, Elsevier (Computers in Industry)
  • 3. • 3 PhD , ULiège, Belgium • Finance • Marketing • Medicine • 1 PhD , JAIST Japan (Aug. 2018) 3 Team
  • 4. • Natural Language Processing (NLP) • Traitement automatique de langues naturelles (TAL) • Methods for “analysing” language • Expressed in written form, text data • Text data common in NLP • Tweets • Amazon/Yelp reviews • Wikipedia • Domain-specific articles (finance, medicine, …) 4 Introduction
  • 5. • Variety of Analysis • Document classification, e.g. • Sentiment analysis • Information extraction, e.g. • Extracting facts from legal texts • Machine translation • Methods Evolution • From formal logics, linguistics • To machine learning, deep learning 5 Introduction (cont)
  • 6. • Distinction in methods • Pipeline organization 6 Methods & Pipeline Pre-processing Feature Engineering Document Classification Sentiment Analysis Machine Translation Text collection Low-level NLP Tasks High-level NLP Tasks
  • 7. • Clean the data • Removing stopwords (“a”, “the”,….) • Removing non-ASCII characters • Straightforward • No learning (machine/deep) involved 8 Low-Level: Pre-processing Pre-processing Feature Engineering
  • 8. • Text  Number transformation • Individual tokens from sentence • Tokens: words, numbers, punctuations… • Tokens = features • How to best represent features? 9 Low-Level: Feature Engineering Pre-processing Feature Engineering
  • 9. • As-is • Each token = 1 feature • Eat, ate, eaten: 3 tokens, 3 distinct features • Huge number of features • Curse of dimensionality • Morphology • Replace token with lemma (root) • Eat, ate, eaten  eat: 3 tokens, 1 feature • Demo 10 Feature Representation
  • 10. • Grammatical Information • Use Part-of-Speech (POS)/POS-tagging • Defined in Penn Tree Bank (UPenn) • E.g. 2 nice movies  CD JJ NNS • Several tools for POS-tagging • Stanford NLP (Java) • Scikitlearn/NLTK (Python) • Demo 11 Feature Representation (cont)
  • 11. • Application of machine learning for NLP • Large number of classes (each POS-tag) • Temporal sequence of word occurrence • Hidden Markov Model • 𝑡 𝑛 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑡 𝑛 𝑃 𝑡 𝑛 𝑤 𝑛 ≈ 𝑎𝑟𝑔𝑚𝑎𝑥 𝑡 𝑛 𝑃 𝑤𝑖 𝑡𝑖 𝑛 𝑖=1 𝑃 𝑡𝑖 𝑡𝑖−1 • 𝑃 𝑤𝑖 𝑡𝑖 : prob. pos-tag 𝑡𝑖 given word 𝑤𝑖 • 𝑡𝑖 𝑡𝑖−1 : prob. pos-tag 𝑡𝑖−1 given pos-tag 𝑡𝑖 12 Part-of-Speech Tagging
  • 12. • How to select best features? • Intuitively: some words are more important than others • E.g. “doping”  sports documents • Tf-Idf • Term frequency-Inverse document frequency • Standard statistical tests • Chi-square • Mutual Information • Demo 13 Low-Level: Feature Engineering
  • 13. • High-level tasks • Features (low-level task) as input • Sentiment Analysis • Determine sentiment in customer reviews • E.g. movie reviews, Amazon product reviews • Classification Problem • 2 (3) classes/categories • +, - (neutral) • Supervised Learning • Movie reviews, annotated with sentiment class, available • Train classification algorithm • Naïve-Bayes, SVM, Random Forests, Neural Networks 14 High-Level: Sentiment Analysis Sentiment Analysis Machine Translation … Low-level NLP Tasks High-level NLP Tasks Features
  • 14. • Confusion matrix • True positive, false negative • True negative, false positive • Precision • Fraction of reviews correctly classified • How precise our model is? • Recall • Fraction of correct reviews (from gold standard set) correctly classified • What is the coverage of the model • F1-score • Balances precision, recall 15 High-Level: Evaluation Metrics
  • 15. • Feature Engineering • Core of machine learning, NLP but… • Manual, time-consuming • Bottleneck in machine learning, NLP • Deep Learning • Neural network with many hidden layers • Supervised Learning Approach • Trained on annotated data • Movie reviews with sentiment class • Input: word (vectors) from reviews • Output: class label (+,-, neutral) • Hidden layers learn feature representation • No (minimum) feature engineering 16 Deep Learning in NLP
  • 16. • Different Deep Learning Architectures • E.g. CNN for image processing • RNN (Recurrent Neural Network) • State of the art for text • Considers temporal nature of tokens in sentence 17 Deep Learning in NLP (cont)
  • 17. 18 RNN for Sentiment Analysis • Sentiment Challenge • Each clause can express a different sentiment • Need to keep track of word sequences • Need to compose individual sentiments for overall sentiment - This movie doesn't care about cleverness, wit or any other kind of intelligent humor. -Those who find ugly meanings in beautiful things are corrupt without being charming.
  • 18. 19 Language Processing/Sentiment Analysis (cont) • Trained over sentiment treebank • Phrases, clauses, sentences, e.g. “This isn’ a new idea” • Annotated with respective sentiments (blue: +, red: -) Java Demo (Stanford Libraries)
  • 19. 20 Unsupervised Learning/Word Embeddings • Neural language models/word embeddings • Word2Vec (shallow neural network, not deep learning) • Predict context given centre word (skip gram) • E.g. given “bankrupt”, predict “the bank went bankrupt last year” • Words/contexts from Google news
  • 20. 21 Towards Unsupervised Learning (cont) • Word vectors representation capture semantic properties • Word meaning and geometry • King – queen – man = woman
  • 21. 22 THE END Thank you for your attention