Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Popular Text
Analytics
Algorithms
What is text analytics?
It is all about deriving high-quality structured
data for analysis from unstructured text.
Why is text analytics used?
It is used to measure customer opinions, product reviews,
feedback, to provide search facility...
What are the primary steps in text analytics?
Text acquisition and
preparation
Processing and analysis
Reporting
(visualiz...
For instance, social media chatter around
brand can create a supremely spiraling
impact (remember the post which showed a
...
In addition to social media data, other
examples include e-mail messages, call
center notes, and customer records.
In addition to social media data, other
examples include e-mail messages,
call center notes, and customer
records.
What type of
information
can be
extracted?
Terms
Named entity
Concept
Sentiment
Terms
These are extraction based on keywords (on own site
or competitor site)
Named entities
These are extracted to answer the ‘who’, ‘what’, or
‘where’. Some instances include name, location,
timesta...
Concept
These are extracted to answer the ‘about’ of a piece of
content. It describes the idea behind the content.
Sentiment
These are extracted to gauge the overall feeling around a
brand at the moment. The above United Airlines
example...
What type of
tools/algorithms
are used for text
analytics?
Decision tree
Naive-Bayes
Support Vector Machine
K-nearest neig...
Decision Trees
This is a classifier that seeks to
repeatedly group data into groups or
classes. It comes in handy for task...
Popular
algorithms in
Decision trees
ID3: Iternative Dichotomizer builds a decision tree
that splits data based on highest...
Naive-Bayes
This is a popular technique to classify
text and documents based on a
category (whether to classify a
document...
Naive-Bayes
Rather than being a single distinct algorithm, it is a set of algorithms that work on
one underlying principle...
Support
Vector
Machines
This is a supervised machine learning
algorithm. It can be applied on
classification and regressio...
Applications of SVM
It is used in hypertext categorization, classification of images,
and facial recognition applications.
K Nearest
Neighbors
k-NN is used is search items where
you are looking for something similar.
You determine similarity by ...
Applications of k-NN
The best example of k-NN’s prowess is an e-commerce site’s
product recommendation feature. You can al...
Artificial
Neural
Networks
ANNs are primarily utilized for non-
linear boundaries- based
classification. Much like the wor...
Algorithms to
train ANN
Gradient Descent
Evolutionary Algorithms
Genetic Algorithms
Applications of ANN
Image compression, handwriting analysis, and stock exchange
movement prediction are some sectors where...
Fuzzy
C-Means
This is a useful form of clustering that
can add value when there are items
that can be a part of more than ...
Steps in Fuzzy
C-Means
Pick
Pick a number
of clusters
where the
items can be
categorized
Assign
Assign
coefficient to
each...
Applications of Fuzzy C-Means
Disciplines like Bioinformatics, healthcare, and economics
make use of fuzzy c-means with gr...
Latent
Dirichlet
Allocation
(LDA)
It helps in finding a linear
combination of features that
distinguishes or characterizes...
Primary steps
in LDA
01
Provide an
estimate of the
potential number
of topics
02
Algorithm assigns a
word to a topic
Algor...
An example of LDA
Suppose there are three separate sentences.
1. I eat chicken and vegetables
2. Chicken are pets
3. My do...
A pioneer is custom and large-scale web data extraction.
www.promptcloud.com | sales@promptcloud.com
Upcoming SlideShare
Loading in …5
×

Popular Text Analytics Algorithms

This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:

- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation

  • Be the first to comment

Popular Text Analytics Algorithms

  1. 1. Popular Text Analytics Algorithms
  2. 2. What is text analytics? It is all about deriving high-quality structured data for analysis from unstructured text.
  3. 3. Why is text analytics used? It is used to measure customer opinions, product reviews, feedback, to provide search facility, sentimental analysis and entity modeling to support data-backed decision making.
  4. 4. What are the primary steps in text analytics? Text acquisition and preparation Processing and analysis Reporting (visualization/presentation)
  5. 5. For instance, social media chatter around brand can create a supremely spiraling impact (remember the post which showed a Kentucky man was violently removed from his United Airlines seat on an overbooked flight? And how it lead to a social media disaster for the airline?).
  6. 6. In addition to social media data, other examples include e-mail messages, call center notes, and customer records.
  7. 7. In addition to social media data, other examples include e-mail messages, call center notes, and customer records.
  8. 8. What type of information can be extracted? Terms Named entity Concept Sentiment
  9. 9. Terms These are extraction based on keywords (on own site or competitor site)
  10. 10. Named entities These are extracted to answer the ‘who’, ‘what’, or ‘where’. Some instances include name, location, timestamp, or product.
  11. 11. Concept These are extracted to answer the ‘about’ of a piece of content. It describes the idea behind the content.
  12. 12. Sentiment These are extracted to gauge the overall feeling around a brand at the moment. The above United Airlines example will be (evidently) negative sentiment, denoting unhappy customers, and potential business losses.
  13. 13. What type of tools/algorithms are used for text analytics? Decision tree Naive-Bayes Support Vector Machine K-nearest neighbours Artificial Neural Networks Fuzzy C-Means LDA
  14. 14. Decision Trees This is a classifier that seeks to repeatedly group data into groups or classes. It comes in handy for tasks like classification or regression.
  15. 15. Popular algorithms in Decision trees ID3: Iternative Dichotomizer builds a decision tree that splits data based on highest information gain (and lowest entropy) till every group has homogenous data. C4.5: This algorithm too uses information gain and entropy to classify data (just like ID3). Unlike ID3, it accepts continuous and discrete features and handles incomplete data too. CART: Classification and Regression Tree works just like C4.5. One notable difference is that CART uses Gini impurity (to assess ‘purity’ or homogeneity of the node) instead of information gain/entropy used by C4.5
  16. 16. Naive-Bayes This is a popular technique to classify text and documents based on a category (whether to classify a document as Sport or as Political based on the occurrence of certain words). It is a simple way to assign class or category labels to instances or cases.
  17. 17. Naive-Bayes Rather than being a single distinct algorithm, it is a set of algorithms that work on one underlying principle -- “the value of a given feature is independent of the value of any other feature”.
  18. 18. Support Vector Machines This is a supervised machine learning algorithm. It can be applied on classification and regression problems. Its essential component is kernel trick which transforms linear data into non-linear data by replacing its features by a kernel function. It is used in hypertext categorization, classification of images, and facial recognition applications.
  19. 19. Applications of SVM It is used in hypertext categorization, classification of images, and facial recognition applications.
  20. 20. K Nearest Neighbors k-NN is used is search items where you are looking for something similar. You determine similarity by creating a vector representation of the items and then compare how similar or dissimilar they are using a distance metric like Euclidean distance.
  21. 21. Applications of k-NN The best example of k-NN’s prowess is an e-commerce site’s product recommendation feature. You can also utilize k-NN to do Concept Search (finding semantically similar documents).
  22. 22. Artificial Neural Networks ANNs are primarily utilized for non- linear boundaries- based classification. Much like the working of the human brain, ANN operates on hidden states (which correspond to the neurons in the brain).
  23. 23. Algorithms to train ANN Gradient Descent Evolutionary Algorithms Genetic Algorithms
  24. 24. Applications of ANN Image compression, handwriting analysis, and stock exchange movement prediction are some sectors where ANN comes in useful.
  25. 25. Fuzzy C-Means This is a useful form of clustering that can add value when there are items that can be a part of more than one cluster. It works on the principle that after the clustering is over, all items in a cluster are as similar as possible to each other.
  26. 26. Steps in Fuzzy C-Means Pick Pick a number of clusters where the items can be categorized Assign Assign coefficient to each data point for being present inside the cluster Repeat Repeat till the coefficients’ value updates between two iterations is not more than the pre-defined sensitivity threshold value
  27. 27. Applications of Fuzzy C-Means Disciplines like Bioinformatics, healthcare, and economics make use of fuzzy c-means with great success.
  28. 28. Latent Dirichlet Allocation (LDA) It helps in finding a linear combination of features that distinguishes or characterizes multiple classes of events or objects.
  29. 29. Primary steps in LDA 01 Provide an estimate of the potential number of topics 02 Algorithm assigns a word to a topic Algorithm will check the accuracy of topic assignment in a loop This helps in ensuring coherent topic clustering.
  30. 30. An example of LDA Suppose there are three separate sentences. 1. I eat chicken and vegetables 2. Chicken are pets 3. My dog loves to eat chicken With LDA, topic clustering for these 3 lines are done as follows – • Sentence 1 = 100% Topic B • Sentence 2 = 100% Topic A • Sentence 3= 33% Topic A and 67% Topic B Now we infer that there are two clusters for sentence classification – Pets (Topic A) and Food (Topic B).
  31. 31. A pioneer is custom and large-scale web data extraction. www.promptcloud.com | sales@promptcloud.com

×