SlideShare a Scribd company logo
1 of 20
Presented by: Ajay Ram K P
What is Text analytics??
 Text analytics is the process of
 analyzing unstructured text,
 extracting relevant information
 and transforming it into useful business intelligence.
 Text analytics processes can be performed manually, but
the amount of text-based data available to companies
today makes it increasingly important to use intelligent,
automated solutions.
2
Why is Text Analytics important??
 Emails, online reviews, tweets, call center agent notes, and
the vast array of other written feedback, all hold insight into
customer wants and needs only if you can unlock it.
 Text analytics is the way to extract meaning from this
unstructured text, and to uncover patterns and themes.
3
Text Analytics in R
 Text Analytics in R is carried out with the help of tm
package.
 It is a framework for text mining applications within R.
 Contains functions for actions such as content
transformation, word removal, finding frequent terms and
lot more
4
The Case Study data
 The data used is a collection of game reviews in an Excel
sheet.
 Game reviews from 1000 gamers are recorded in the data
set.
 The objective is to do an analysis of these reviews treating
all of them as one text and find out the most frequent words.
5
Part 1
 The review are read to a variable docs using functions VectorSource(),
Corpus().
 VectorSource() sets a source for comparison.
 Corpus() creates a skeleton of the text.
6
Reading the Data
 Data cleansing is required as most of the reviews are contain punctuations,
numbers, stop words etc. that we don’t require for analysis.
 Depending out what you are trying to achieve with your analysis, you may
want to do the data cleaning step differently.
 Data cleansing is done using tm_map() function in R
7
Cleaning the Data
 Converting document into Document Term Matrix
 A document-term matrix or term-document matrix is a mathematical matrix that
describes the frequency of terms that occur in a collection of documents. In a
document-term matrix, rows correspond to documents in the collection and
columns correspond to terms.
 The tm package stores document term matrixes as sparse matrices for efficacy.
Since we only have 1000 reviews and one document we can just convert our term-
document-matrix into a normal matrix, which is easier to work with.
Code: dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
 We then take the column sums of this matrix, which will give us a named
vector.
 And now we can sort this vector to see the most frequently used words.
Code: v <- sort(rowSums(m),decreasing=TRUE)
head(v)
8
Finding the frequent terms and their
frequency
9
 For plotting the Word Cloud, we use wordcloud package.
10
Plotting the Word Cloud
And Voila!!!
11
Part 2
12
Creating the Network
 For network creation, we take help of packages
 igraph
 sna
 network
 Finding the association.
 findAssocs() function is used.
13
Creating the Network
 Plotting the graph.
 Using igraph package & graph.data.frame() function
14
Creating the Network
And there it is!!!
15
Another Graph…
 Graph where frequent terms are node and number of
frequencies are interaction/strength.
16
In case of large networks
 Say the network has more than 10K nodes. Such networks will be
complicated.
 For quantifying such networks we go for statistical aspects of the network.
 Use of Random network, Scale-free network or Hierarchical network models
in such cases would be fit.
17
Random Network Scale-free Network Hierarchical Network
Where else can network approaches be
powerful??
 Biological Science
 Economics
 Computer science
18
THANK YOU!!!
20

More Related Content

What's hot

Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
 
4. R- files Reading and Writing
4. R- files Reading and Writing4. R- files Reading and Writing
4. R- files Reading and Writingkrishna singh
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Rupak Roy
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureAustin Benson
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
 
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules IJECEIAES
 
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...IJMER
 
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationApproximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationIOSR Journals
 

What's hot (18)

Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 
Data structue q & a
Data structue q & aData structue q & a
Data structue q & a
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
 
Competition16
Competition16Competition16
Competition16
 
Assignment2
Assignment2Assignment2
Assignment2
 
Ceis 1
Ceis 1Ceis 1
Ceis 1
 
Lect 1-2
Lect 1-2Lect 1-2
Lect 1-2
 
4. R- files Reading and Writing
4. R- files Reading and Writing4. R- files Reading and Writing
4. R- files Reading and Writing
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest
 
Link prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structureLink prediction in networks with core-fringe structure
Link prediction in networks with core-fringe structure
 
Psychology- Mode
Psychology- ModePsychology- Mode
Psychology- Mode
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
 
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
Generating Non-redundant Multilevel Association Rules Using Min-max Exact Rules
 
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...
 
Ijtra130516
Ijtra130516Ijtra130516
Ijtra130516
 
leslie
leslieleslie
leslie
 
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta IntegrationApproximating Source Accuracy Using Dublicate Records in Da-ta Integration
Approximating Source Accuracy Using Dublicate Records in Da-ta Integration
 
130509
130509130509
130509
 

Viewers also liked

Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)Spark Summit
 
M140039MS_Ajay Ram
M140039MS_Ajay RamM140039MS_Ajay Ram
M140039MS_Ajay RamAjay Ram
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlayticsAjay Ram
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsYousef Fadila
 
The Rock Breaker
The Rock BreakerThe Rock Breaker
The Rock BreakerAjay Ram
 
How Social Media is Transforming CRM - Infographics
How Social Media is Transforming CRM - InfographicsHow Social Media is Transforming CRM - Infographics
How Social Media is Transforming CRM - InfographicsAjay Ram
 
Last Mile Access Technologies
Last Mile Access TechnologiesLast Mile Access Technologies
Last Mile Access TechnologiesTharindu Kumara
 
Data analysis with R and Julia
Data analysis with R and JuliaData analysis with R and Julia
Data analysis with R and JuliaMark Tabladillo
 
Voices of Business: Our Journey and Lessons Learned
Voices of Business: Our Journey and Lessons LearnedVoices of Business: Our Journey and Lessons Learned
Voices of Business: Our Journey and Lessons LearnedDelvinia
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesMarissa Kobylenski
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisTharindu Kumara
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Structured Cabling Technologies for Networking
Structured Cabling Technologies for NetworkingStructured Cabling Technologies for Networking
Structured Cabling Technologies for NetworkingTharindu Kumara
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSkillspeed
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 

Viewers also liked (20)

Week12
Week12Week12
Week12
 
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
Reactive Feature Generation with Spark and MLlib by Jeffrey Smith (1)
 
M140039MS_Ajay Ram
M140039MS_Ajay RamM140039MS_Ajay Ram
M140039MS_Ajay Ram
 
Facebook
FacebookFacebook
Facebook
 
Words that Matter
Words that MatterWords that Matter
Words that Matter
 
Text Analytics
Text AnalyticsText Analytics
Text Analytics
 
The power of social media anlaytics
The power of social media anlayticsThe power of social media anlaytics
The power of social media anlaytics
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
The Rock Breaker
The Rock BreakerThe Rock Breaker
The Rock Breaker
 
How Social Media is Transforming CRM - Infographics
How Social Media is Transforming CRM - InfographicsHow Social Media is Transforming CRM - Infographics
How Social Media is Transforming CRM - Infographics
 
Last Mile Access Technologies
Last Mile Access TechnologiesLast Mile Access Technologies
Last Mile Access Technologies
 
Data analysis with R and Julia
Data analysis with R and JuliaData analysis with R and Julia
Data analysis with R and Julia
 
Voices of Business: Our Journey and Lessons Learned
Voices of Business: Our Journey and Lessons LearnedVoices of Business: Our Journey and Lessons Learned
Voices of Business: Our Journey and Lessons Learned
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
IP Multicasting
IP MulticastingIP Multicasting
IP Multicasting
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Structured Cabling Technologies for Networking
Structured Cabling Technologies for NetworkingStructured Cabling Technologies for Networking
Structured Cabling Technologies for Networking
 
Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 

Similar to Text Analytics

Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingNagarjun Kotyada
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
 
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET Journal
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsDerek Kane
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification SystemIRJET Journal
 
Multi label classification of
Multi label classification ofMulti label classification of
Multi label classification ofijaia
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
Get started with R lang
Get started with R langGet started with R lang
Get started with R langsenthil0809
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniquepptGyandeep Kansal
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docxrohithprabhas1
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...IRJET Journal
 

Similar to Text Analytics (20)

Sales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R ProgrammingSales_Prediction_Technique using R Programming
Sales_Prediction_Technique using R Programming
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
Predicting the relevance of search results for e-commerce systems
Predicting the relevance of search results for e-commerce systemsPredicting the relevance of search results for e-commerce systems
Predicting the relevance of search results for e-commerce systems
 
Multi label classification of
Multi label classification ofMulti label classification of
Multi label classification of
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
Get started with R lang
Get started with R langGet started with R lang
Get started with R lang
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniqueppt
 
Final ppt
Final pptFinal ppt
Final ppt
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
Poster (2)
Poster (2)Poster (2)
Poster (2)
 

Text Analytics

  • 2. What is Text analytics??  Text analytics is the process of  analyzing unstructured text,  extracting relevant information  and transforming it into useful business intelligence.  Text analytics processes can be performed manually, but the amount of text-based data available to companies today makes it increasingly important to use intelligent, automated solutions. 2
  • 3. Why is Text Analytics important??  Emails, online reviews, tweets, call center agent notes, and the vast array of other written feedback, all hold insight into customer wants and needs only if you can unlock it.  Text analytics is the way to extract meaning from this unstructured text, and to uncover patterns and themes. 3
  • 4. Text Analytics in R  Text Analytics in R is carried out with the help of tm package.  It is a framework for text mining applications within R.  Contains functions for actions such as content transformation, word removal, finding frequent terms and lot more 4
  • 5. The Case Study data  The data used is a collection of game reviews in an Excel sheet.  Game reviews from 1000 gamers are recorded in the data set.  The objective is to do an analysis of these reviews treating all of them as one text and find out the most frequent words. 5
  • 6. Part 1  The review are read to a variable docs using functions VectorSource(), Corpus().  VectorSource() sets a source for comparison.  Corpus() creates a skeleton of the text. 6 Reading the Data
  • 7.  Data cleansing is required as most of the reviews are contain punctuations, numbers, stop words etc. that we don’t require for analysis.  Depending out what you are trying to achieve with your analysis, you may want to do the data cleaning step differently.  Data cleansing is done using tm_map() function in R 7 Cleaning the Data
  • 8.  Converting document into Document Term Matrix  A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms.  The tm package stores document term matrixes as sparse matrices for efficacy. Since we only have 1000 reviews and one document we can just convert our term- document-matrix into a normal matrix, which is easier to work with. Code: dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm)  We then take the column sums of this matrix, which will give us a named vector.  And now we can sort this vector to see the most frequently used words. Code: v <- sort(rowSums(m),decreasing=TRUE) head(v) 8 Finding the frequent terms and their frequency
  • 9. 9
  • 10.  For plotting the Word Cloud, we use wordcloud package. 10 Plotting the Word Cloud
  • 12. Part 2 12 Creating the Network  For network creation, we take help of packages  igraph  sna  network
  • 13.  Finding the association.  findAssocs() function is used. 13 Creating the Network
  • 14.  Plotting the graph.  Using igraph package & graph.data.frame() function 14 Creating the Network
  • 15. And there it is!!! 15
  • 16. Another Graph…  Graph where frequent terms are node and number of frequencies are interaction/strength. 16
  • 17. In case of large networks  Say the network has more than 10K nodes. Such networks will be complicated.  For quantifying such networks we go for statistical aspects of the network.  Use of Random network, Scale-free network or Hierarchical network models in such cases would be fit. 17 Random Network Scale-free Network Hierarchical Network
  • 18. Where else can network approaches be powerful??  Biological Science  Economics  Computer science 18
  • 19.