SlideShare a Scribd company logo
1 of 15
Download to read offline
Sentiment analysis of tweets using Neural Networks
Adri´an Palacios
Universidad Polit´ecnica de Valencia
June 6th, 2013
1 de 15
Introduction
The objective of this work is:
• To use Neural Networks (using the April toolkit) for the polarity
classification of tweets.
• To check how NNs behave when applying different techniques for
preprocessing the data.
• We don’t look for good results, we are just experimenting with
these techniques.
2 de 15
Preprocessing of tweets
Prior to the training of NNs, we need to obtain a feature vector
representation for the samples (tweets):
3 de 15
Preprocessing techniques
To achieve this, we create a bag of words after applying one of the
following preprocessing techniques:
1. Unigrams.
2. Bigrams.
3. Stemming.
4. Lemmatization.
5. Part-of-Speech tagging.
4 de 15
Stemming
Stemming: A process that chops off the suffixes of a given word
following some predefined rules.
Examples:
• Stem(run): run.
• Stem(ran): ran.
• Stem(running): run.
5 de 15
Lemmatization
Lemmatization: A process that determines the lemma (canonical
form of the lexeme) of a given word.
Examples:
• Lemma(run): run.
• Lemma(ran): run.
• Lemma(running): run.
6 de 15
PoS tagging
PoS tagging: The assignation Part-of-Speech tags to the words of a
given sentence.
7 de 15
Learning techniques
The polarity classification will be made:
• Using a Multilayer Perceptron with a single layer,
• after 5-fold cross-validation technique,
• and an ensemble of the resulting MLPs.
8 de 15
Hyper-parameter search
We will perform a random search for hyper-parameter optimization
instead of a grid search.
9 de 15
Ensemble methods
After training is done, since we use 5-fold cross-validation, we get 5
MLPs for each set of parameters.
To be consistent, we merge these 5 classifiers into a single one using
the bootstrap aggregating method (votes have equal weight) for the
ensemble.
10 de 15
Corpus
We will work with the corpus provided at the 2012 edition of the
Workshop on Sentiment Analysis at SEPLN.
Training Test
Samples 7219 60798
11 de 15
Training results
Accuracy of the validation set classification:
3 levels 5 levels
Unigrams 54.44 45.62
Bigrams 54.09 39.99
Stemming 62.34 47.49
Lemmatization 61.60 46.75
PoS-tagging 52.58 38.40
12 de 15
Test results
Accuracy of the test set classification (average and ensemble):
3 levels 5 levels
Unigrams 32.13 26.12
Bigrams 32.39 28.21
Stem. 32.34 26.81
Lemma. 31.84 26.18
PoS-tag. 35.22 35.22
3 levels 5 levels
Unigrams 32.16 26.52
Bigrams 32.32 29.32
Stem. 32.23 27.16
Lemma. 31.80 26.49
PoS-tag. 35.22 35.22
13 de 15
Conclusions
Results are bad, but we can improve by:
• Using more complex techniques for preprocessing.
• Using more complex models for learning.
• Exploring more values for random hyper-parameter search.
• Learning from PoS tagged tweets in a different way.
14 de 15
Questions?
The tools used for the experiments can be found at:
• The NLTK: nltk.org
• Freeling: nlp.lsi.upc.edu/freeling
• The April toolkit: github.com/pakozm/april-ann
15 de 15

More Related Content

What's hot

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
Amit Kumar
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
butest
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

What's hot (18)

Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
Learning
LearningLearning
Learning
 
Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...
 

Viewers also liked

Evolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS GamesEvolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS Games
Adrián Palacios Corella
 

Viewers also liked (8)

Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Evolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS GamesEvolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS Games
 
Adaptación de skip-gramas a modelos conexionistas del lenguaje
Adaptación de skip-gramas a modelos conexionistas del lenguajeAdaptación de skip-gramas a modelos conexionistas del lenguaje
Adaptación de skip-gramas a modelos conexionistas del lenguaje
 
Multimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaMultimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimedia
 
CNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian TweetsCNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian Tweets
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Turning Analysis into Action with APIs - Superweek 2017
Turning Analysis into Action with APIs - Superweek 2017Turning Analysis into Action with APIs - Superweek 2017
Turning Analysis into Action with APIs - Superweek 2017
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 

Similar to Sentiment analysis of tweets using Neural Networks

CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
Pedro Lopes
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
ssuserd23711
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
shamsul2010
 

Similar to Sentiment analysis of tweets using Neural Networks (20)

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
Ensemble methods in Machine learning technology
Ensemble methods in Machine learning technologyEnsemble methods in Machine learning technology
Ensemble methods in Machine learning technology
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
Simple Ensemble Learning
Simple Ensemble LearningSimple Ensemble Learning
Simple Ensemble Learning
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Artificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron'sArtificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron's
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Testing
TestingTesting
Testing
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Sentiment analysis of tweets using Neural Networks

  • 1. Sentiment analysis of tweets using Neural Networks Adri´an Palacios Universidad Polit´ecnica de Valencia June 6th, 2013 1 de 15
  • 2. Introduction The objective of this work is: • To use Neural Networks (using the April toolkit) for the polarity classification of tweets. • To check how NNs behave when applying different techniques for preprocessing the data. • We don’t look for good results, we are just experimenting with these techniques. 2 de 15
  • 3. Preprocessing of tweets Prior to the training of NNs, we need to obtain a feature vector representation for the samples (tweets): 3 de 15
  • 4. Preprocessing techniques To achieve this, we create a bag of words after applying one of the following preprocessing techniques: 1. Unigrams. 2. Bigrams. 3. Stemming. 4. Lemmatization. 5. Part-of-Speech tagging. 4 de 15
  • 5. Stemming Stemming: A process that chops off the suffixes of a given word following some predefined rules. Examples: • Stem(run): run. • Stem(ran): ran. • Stem(running): run. 5 de 15
  • 6. Lemmatization Lemmatization: A process that determines the lemma (canonical form of the lexeme) of a given word. Examples: • Lemma(run): run. • Lemma(ran): run. • Lemma(running): run. 6 de 15
  • 7. PoS tagging PoS tagging: The assignation Part-of-Speech tags to the words of a given sentence. 7 de 15
  • 8. Learning techniques The polarity classification will be made: • Using a Multilayer Perceptron with a single layer, • after 5-fold cross-validation technique, • and an ensemble of the resulting MLPs. 8 de 15
  • 9. Hyper-parameter search We will perform a random search for hyper-parameter optimization instead of a grid search. 9 de 15
  • 10. Ensemble methods After training is done, since we use 5-fold cross-validation, we get 5 MLPs for each set of parameters. To be consistent, we merge these 5 classifiers into a single one using the bootstrap aggregating method (votes have equal weight) for the ensemble. 10 de 15
  • 11. Corpus We will work with the corpus provided at the 2012 edition of the Workshop on Sentiment Analysis at SEPLN. Training Test Samples 7219 60798 11 de 15
  • 12. Training results Accuracy of the validation set classification: 3 levels 5 levels Unigrams 54.44 45.62 Bigrams 54.09 39.99 Stemming 62.34 47.49 Lemmatization 61.60 46.75 PoS-tagging 52.58 38.40 12 de 15
  • 13. Test results Accuracy of the test set classification (average and ensemble): 3 levels 5 levels Unigrams 32.13 26.12 Bigrams 32.39 28.21 Stem. 32.34 26.81 Lemma. 31.84 26.18 PoS-tag. 35.22 35.22 3 levels 5 levels Unigrams 32.16 26.52 Bigrams 32.32 29.32 Stem. 32.23 27.16 Lemma. 31.80 26.49 PoS-tag. 35.22 35.22 13 de 15
  • 14. Conclusions Results are bad, but we can improve by: • Using more complex techniques for preprocessing. • Using more complex models for learning. • Exploring more values for random hyper-parameter search. • Learning from PoS tagged tweets in a different way. 14 de 15
  • 15. Questions? The tools used for the experiments can be found at: • The NLTK: nltk.org • Freeling: nlp.lsi.upc.edu/freeling • The April toolkit: github.com/pakozm/april-ann 15 de 15