How AI, OpenAI, and ChatGPT impact business and software.
Practical Natural Language Processing From Theory to Industrial Applications
1. Practical Natural Language Processing
From Theory to Industrial Applications
Jaganadh G
http://jaganadhg.in
jaganadhg@gmail.com
Karpagam University
Coimbatore
19th March 2012
Jaganadh G Practical Natural Language Processing
2. About me !!
Working in Natural Language Processing, Machine
Learning, Data Mining etc...
Passionate about Free and Open source :-)
When gets free time teaches Python, Speaks about FOSS
and blogs at
http://jaganadhg.in
I am a computational linguist / Linguist and Indologist,
Book reviewer
Software Engineer by Profession
Jaganadh G Practical Natural Language Processing
3. Question ??
Have you ever used any Natural Language Processing based
tools/services?
Jaganadh G Practical Natural Language Processing
4. Question ??
Have you ever used any Natural Language Processing based
tools/services?
Jaganadh G Practical Natural Language Processing
5. Question ??
Have you ever used any Natural Language Processing based
tools/services?
Jaganadh G Practical Natural Language Processing
6. What is Natural Language Processing (NLP) ?
Aim : To build intelligent systems that can interact with
human beings as like human beings
Jaganadh G Practical Natural Language Processing
7. What is Natural Language Processing (NLP) ?
Aim : To build intelligent systems that can interact with
human beings as like human beings
Jaganadh G Practical Natural Language Processing
8. What is Natural Language Processing (NLP) ?
Aim : To build intelligent systems that can interact with
human beings as like human beings
A sub-field of Artificial Intelligence (AI)
Jaganadh G Practical Natural Language Processing
9. What is Natural Language Processing (NLP) ?
Aim : To build intelligent systems that can interact with
human beings as like human beings
A sub-field of Artificial Intelligence (AI)
Inter-disciplinary subject (Language + Linguistics +
Statistics + Computer Science + .. )
Natural Language
Refers to the language spoken by people, e.g.
English,Japanese, Tamil, Malayalam as opposed to artificial
languages, like C++, Java, etc.
Jaganadh G Practical Natural Language Processing
10. Definition
Natural Language Processing
Natural Language Processing is a theoretically motivated range
of computational techniques for analyzing and representing
naturally occurring texts/speech at one or more levels of
linguistic analysis for the purpose of achieving human-like
language processing for a range of tasks or applications.
NLP was considered as an academic discipline before
some 10 to 20 years.
Now concepts from NLP is applied in variety of
Computing Platforms and Services
Jaganadh G Practical Natural Language Processing
11. Practical NLP ?
Problem
Picture Courtesy: http://twitpic.com/1y21qm/full
Jaganadh G Practical Natural Language Processing
12. Practical NLP ?
Problem
Before going to some theory can we have some funny
practical problems to solve ?
Picture Courtesy: http://twitpic.com/1y21qm/full
Jaganadh G Practical Natural Language Processing
13. Practical NLP ?
Problem
Before going to some theory can we have some funny
practical problems to solve ?
Picture Courtesy: http://twitpic.com/1y21qm/full
Jaganadh G Practical Natural Language Processing
15. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Jaganadh G Practical Natural Language Processing
16. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Jaganadh G Practical Natural Language Processing
17. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Jaganadh G Practical Natural Language Processing
18. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Tweets related to enquirers
Jaganadh G Practical Natural Language Processing
19. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Tweets related to enquirers
They requires following things to be automated
Jaganadh G Practical Natural Language Processing
20. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Tweets related to enquirers
They requires following things to be automated
Identify tweet category
Jaganadh G Practical Natural Language Processing
21. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Tweets related to enquirers
They requires following things to be automated
Identify tweet category
Process home-delivery request
Jaganadh G Practical Natural Language Processing
22. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Tweets related to enquirers
They requires following things to be automated
Identify tweet category
Process home-delivery request
Evaluate quality related tweets
Jaganadh G Practical Natural Language Processing
23. Practical NLP
Problem
Tweet-a-Toddy receives thousands of tweets per day
Tweets requesting home delivery
Tweets about quality of products
Tweets related to enquirers
They requires following things to be automated
Identify tweet category
Process home-delivery request
Evaluate quality related tweets
How?
How to find a solution for Tweet-a-Toddy
Jaganadh G Practical Natural Language Processing
24. Solution
??
Any Solutions
Jaganadh G Practical Natural Language Processing
25. Solution
??
Any Solutions
Some thoughts
Jaganadh G Practical Natural Language Processing
26. Solution
??
Any Solutions
Some thoughts
Text Classification
Jaganadh G Practical Natural Language Processing
27. Solution
??
Any Solutions
Some thoughts
Text Classification
Entity Identification
Jaganadh G Practical Natural Language Processing
28. Solution
??
Any Solutions
Some thoughts
Text Classification
Entity Identification
Information Extraction
Jaganadh G Practical Natural Language Processing
29. Solution
??
Any Solutions
Some thoughts
Text Classification
Entity Identification
Information Extraction
Sentiment Analysis
Jaganadh G Practical Natural Language Processing
30. Solution
??
Any Solutions
Some thoughts
Text Classification
Entity Identification
Information Extraction
Sentiment Analysis
Parsing, gammer ...
Jaganadh G Practical Natural Language Processing
31. Solution
??
Any Solutions
Some thoughts
Text Classification
Entity Identification
Information Extraction
Sentiment Analysis
Parsing, gammer ...
Regex (Regular Expressions)
Jaganadh G Practical Natural Language Processing
32. Another Practical Question
Everybody might have used spell checker available in word
processing systems like OpenOffice.org or Microsoft Word
Any guess on how to develop a spell checker system ?
Solutions
Jaganadh G Practical Natural Language Processing
33. Another Practical Question
Everybody might have used spell checker available in word
processing systems like OpenOffice.org or Microsoft Word
Any guess on how to develop a spell checker system ?
Solutions
Word List
Jaganadh G Practical Natural Language Processing
34. Another Practical Question
Everybody might have used spell checker available in word
processing systems like OpenOffice.org or Microsoft Word
Any guess on how to develop a spell checker system ?
Solutions
Word List
Structure of words
Jaganadh G Practical Natural Language Processing
35. Another Practical Question
Everybody might have used spell checker available in word
processing systems like OpenOffice.org or Microsoft Word
Any guess on how to develop a spell checker system ?
Solutions
Word List
Structure of words
Dynamic Programming (Edit Distance)
Jaganadh G Practical Natural Language Processing
36. Another Practical Question ...
Context Sensitive Spell-checking
Identifying and suggesting spelling of words based on context
How ??
Jaganadh G Practical Natural Language Processing
37. Another Practical Question ...
Context Sensitive Spell-checking
Identifying and suggesting spelling of words based on context
How ??
Solutions
Jaganadh G Practical Natural Language Processing
38. Another Practical Question ...
Context Sensitive Spell-checking
Identifying and suggesting spelling of words based on context
How ??
Solutions
Statistical Models
Jaganadh G Practical Natural Language Processing
39. Another Practical Question ...
Context Sensitive Spell-checking
Identifying and suggesting spelling of words based on context
How ??
Solutions
Statistical Models
Word category based suggestions
Jaganadh G Practical Natural Language Processing
41. Why NLP ?
Because ”Information is Power !!!”
Jaganadh G Practical Natural Language Processing
42. Why NLP ?
Because ”Information is Power !!!”
Picture Courtesy: http://soundsgood.in/wikipediafat print book/
Jaganadh G Practical Natural Language Processing
43. Why NLP ?
Because ”Information is Power !!!”
Every day wast amount of text and speech data is being
produced
Picture Courtesy: http://soundsgood.in/wikipediafat print book/
Jaganadh G Practical Natural Language Processing
44. Why NLP ?
Because ”Information is Power !!!”
Every day wast amount of text and speech data is being
produced
Internet == at least 40 Million pages
Picture Courtesy: http://soundsgood.in/wikipediafat print book/
Jaganadh G Practical Natural Language Processing
45. Why NLP ?
Because ”Information is Power !!!”
Every day wast amount of text and speech data is being
produced
Internet == at least 40 Million pages
Picture Courtesy: http://soundsgood.in/wikipediafat print book/
Jaganadh G Practical Natural Language Processing
46. History
Jaganadh G Practical Natural Language Processing
47. History
Second World War !!!
Jaganadh G Practical Natural Language Processing
48. History
Second World War !!!
Machine Translation
Jaganadh G Practical Natural Language Processing
49. History
Second World War !!!
Machine Translation
Now :
Jaganadh G Practical Natural Language Processing
50. History
Second World War !!!
Machine Translation
Now :
Most promising imperfect technology
Jaganadh G Practical Natural Language Processing
51. History
Second World War !!!
Machine Translation
Now :
Most promising imperfect technology
Moves from Lab to Industry to Layman
Jaganadh G Practical Natural Language Processing
52. NLP Really Hard to Achieve?
NLP delas with human languages
Human Language is dynamic and mysterious !!!
Jaganadh G Practical Natural Language Processing
53. NLP Really Hard to Achieve?
NLP delas with human languages
Human Language is dynamic and mysterious !!!
Communication in Human Language
Jaganadh G Practical Natural Language Processing
54. NLP Really Hard to Achieve?
Levels of Knowledge encoding in Language Data
Jaganadh G Practical Natural Language Processing
55. Tasks in NLP
Broad Areas
Jaganadh G Practical Natural Language Processing
56. Tasks in NLP
Broad Areas
Text Processing
Jaganadh G Practical Natural Language Processing
57. Tasks in NLP
Broad Areas
Text Processing
Speech Processing
Jaganadh G Practical Natural Language Processing
58. Major tasks in Text Processing
Jaganadh G Practical Natural Language Processing
59. Major tasks in Text Processing
Word Level Analysis
Jaganadh G Practical Natural Language Processing
60. Major tasks in Text Processing
Word Level Analysis
Morphological Synthesis
Jaganadh G Practical Natural Language Processing
61. Major tasks in Text Processing
Word Level Analysis
Morphological Synthesis
Part of Speech Tagging
Jaganadh G Practical Natural Language Processing
62. Major tasks in Text Processing
Word Level Analysis
Morphological Synthesis
Part of Speech Tagging
Stemming
Jaganadh G Practical Natural Language Processing
63. Major tasks in Text Processing
Word Level Analysis
Morphological Synthesis
Part of Speech Tagging
Stemming
Lemmatization
Jaganadh G Practical Natural Language Processing
64. Major tasks in Text Processing
Word Level Analysis
Morphological Synthesis
Part of Speech Tagging
Stemming
Lemmatization
Sentence Level Analysis - Syntactical Parsing
Jaganadh G Practical Natural Language Processing
65. Major tasks in Text Processing
Word Level Analysis
Morphological Synthesis
Part of Speech Tagging
Stemming
Lemmatization
Sentence Level Analysis - Syntactical Parsing
Discourse Analysis - Semantic Processing
Jaganadh G Practical Natural Language Processing
66. Morphology
The branch of linguistics that studies word structures.
Jaganadh G Practical Natural Language Processing
67. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Jaganadh G Practical Natural Language Processing
68. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Morphological analysis can be explained as: the process of
analyzing words to identify its constituents
Jaganadh G Practical Natural Language Processing
69. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Morphological analysis can be explained as: the process of
analyzing words to identify its constituents
Computational Analysis of Morphology
Morphological Analysis
Jaganadh G Practical Natural Language Processing
70. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Morphological analysis can be explained as: the process of
analyzing words to identify its constituents
Computational Analysis of Morphology
Morphological Analysis
Jaganadh G Practical Natural Language Processing
71. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Morphological analysis can be explained as: the process of
analyzing words to identify its constituents
Computational Analysis of Morphology
Morphological Analysis
Morphological Generation
Jaganadh G Practical Natural Language Processing
72. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Morphological analysis can be explained as: the process of
analyzing words to identify its constituents
Computational Analysis of Morphology
Morphological Analysis
Morphological Generation
Stemming
Jaganadh G Practical Natural Language Processing
73. Morphology
The branch of linguistics that studies word structures.
To a computer program a word is : ???
Morphological analysis can be explained as: the process of
analyzing words to identify its constituents
Computational Analysis of Morphology
Morphological Analysis
Morphological Generation
Stemming
Lemmatization
Jaganadh G Practical Natural Language Processing
74. Practical Question from Morphology
Approximate number of word forms that can be derived from
the word
”maram”
Jaganadh G Practical Natural Language Processing
75. Parts of Speech Tagging
POS tagging is the process of marking up the words in a text
(corpus) as corresponding to a particular part of speech, based
on both its definition, as well as its context.
Ram goes to school.
Ram/NNP goes/VBZ to/TO school/NN ./.
Jaganadh G Practical Natural Language Processing
76. Parts of Speech Tagging
POS tagging is the process of marking up the words in a text
(corpus) as corresponding to a particular part of speech, based
on both its definition, as well as its context.
Ram goes to school.
Ram/NNP goes/VBZ to/TO school/NN ./.
Words are ambiguous !!!!
e.g. book, cricket, bank
Jaganadh G Practical Natural Language Processing
77. Syntactical Parsing
Parsing
In computer science and linguistics, parsing, or, more formally,
syntactic analysis, is the process of analyzing a text, made of a
sequence of tokens (for example, words), to determine its
grammatical structure with respect to a given (more or less)
formal grammar.
Jaganadh G Practical Natural Language Processing
78. Syntactical Parsing
Parsing
In computer science and linguistics, parsing, or, more formally,
syntactic analysis, is the process of analyzing a text, made of a
sequence of tokens (for example, words), to determine its
grammatical structure with respect to a given (more or less)
formal grammar.
Sentences are ambiguous !!!!
Jaganadh G Practical Natural Language Processing
79. Semantics
Study of meaning ans its structure
Jaganadh G Practical Natural Language Processing
80. Semantics
Study of meaning ans its structure
Word meaning is ambiguous !!!!
E.g. marriage
Jaganadh G Practical Natural Language Processing
81. Where can I apply this techniques?
Machine Translation Systems
Jaganadh G Practical Natural Language Processing
82. Where can I apply this techniques?
Machine Translation Systems
Search Engine
Jaganadh G Practical Natural Language Processing
83. Where can I apply this techniques?
Machine Translation Systems
Search Engine
Spell-checker
Jaganadh G Practical Natural Language Processing
84. Where can I apply this techniques?
Machine Translation Systems
Search Engine
Spell-checker
Grammar Checker
Jaganadh G Practical Natural Language Processing
85. Where can I apply this techniques?
Machine Translation Systems
Search Engine
Spell-checker
Grammar Checker
..........
Jaganadh G Practical Natural Language Processing
86. Other Interesting Tasks
Named Entity Identification
Jaganadh G Practical Natural Language Processing
87. Other Interesting Tasks
Named Entity Identification
Information Extraction
Jaganadh G Practical Natural Language Processing
88. Other Interesting Tasks
Named Entity Identification
Information Extraction
Information Retrieval
Jaganadh G Practical Natural Language Processing
89. Other Interesting Tasks
Named Entity Identification
Information Extraction
Information Retrieval
Text Classification and Clustering
Jaganadh G Practical Natural Language Processing
90. Speech Processing
Two Major Areas
Text to Speech
Speech Recognition
Jaganadh G Practical Natural Language Processing
91. Speech Processing
Two Major Areas
Text to Speech
Speech Recognition
Practical Applications
IVR
Technology for Visually Challenged People
Mobile Phones
Speech Enabled Web
Vehicle Mounted GPS Navigator
Jaganadh G Practical Natural Language Processing
93. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Jaganadh G Practical Natural Language Processing
94. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Jaganadh G Practical Natural Language Processing
95. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Custom Search Systems
Jaganadh G Practical Natural Language Processing
96. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Custom Search Systems
Information Extraction
Jaganadh G Practical Natural Language Processing
97. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Custom Search Systems
Information Extraction
Entity Identification
Jaganadh G Practical Natural Language Processing
98. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Custom Search Systems
Information Extraction
Entity Identification
Text Summarization
Jaganadh G Practical Natural Language Processing
99. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Custom Search Systems
Information Extraction
Entity Identification
Text Summarization
Speech Systems
Jaganadh G Practical Natural Language Processing
100. Commerical NLP Applications
What Industry Looks
Components of Word Processors
Machine Translation Systems
Custom Search Systems
Information Extraction
Entity Identification
Text Summarization
Speech Systems
Question Answering Systems
Jaganadh G Practical Natural Language Processing
101. Future of NLP
Future!!!
Semantics oriented technologies
Jaganadh G Practical Natural Language Processing
102. NLP in other domains
Bio-Medical
Legal
Forensic Science
Advertisement
Education
Politics
E-governance
Business Development
Marketing
and where ever we use language !!!
Jaganadh G Practical Natural Language Processing
103. Natural Language Processing in India
Academic Institutions
IIT Kanpur, Kharagpur, Bombay
IIIT hydrabad
IISc Bangalore
AU-KBC Chennai
Amritha University Ettimadai, Coimbatore
IIITMK, Trivandrum
Central University, Hydrabad
JNU, Delhi
Tamil University, Thanjore
Jaganadh G Practical Natural Language Processing
104. Natural Language Processing in India
Industry
Microsoft
Yahoo!
AOL
365Media Pvt. Ltd.
Inside View
Thaazza
AIAIO Labs
Jaganadh G Practical Natural Language Processing
105. Questions ??
Jaganadh G Practical Natural Language Processing
106. References
Daniel Jurafsky,James H. Martin, SPEECH and
LANGUAGE PROCESSING, 2nd Edition.
U.S. Tiwary, Tanveer Siddiqui , Natural Language
Processing and Information Retrieval
Jaganadh G Practical Natural Language Processing
107. Finally
Jaganadh G Practical Natural Language Processing
108. Questions ??
Jaganadh G Practical Natural Language Processing
109. References
Daniel Jurafsky,James H. Martin, SPEECH and
LANGUAGE PROCESSING, 2nd Edition.
U.S. Tiwary, Tanveer Siddiqui , Natural Language
Processing and Information Retrieval
Jaganadh G Practical Natural Language Processing
110. Finally
Jaganadh G Practical Natural Language Processing