CNN Sentiment Analysis Italian Tweets

•Download as PPTX, PDF•

1 like•982 views

This document summarizes research on using convolutional neural networks for sentiment analysis on Italian tweets. It describes training a CNN model using word embeddings and sentiment-specific embeddings generated from tweets. The best performing model used sentiment-specific embeddings trained on 500k tweets, with filters of sizes 7, 8, 9, 10. This model achieved an F-score of 0.6837 on the binary sentiment classification task, outperforming the official run and models using plain word embeddings. The research demonstrated that CNNs and sentiment-specific embeddings are effective for sentiment analysis of Italian tweets.

Data & Analytics

Convolutional Neural Networks for
Sentiment Analysis on Italian Tweets
Giuseppe Attardi, Daniele
Sartiano, Chiara Alzetta,
Federica Semplici
Dipartimento di Informatica
Università di Pisa
Università di Pisa

Task 2. Polarity Classification
G. Attardi, D. Sartiano (2016) SemEval 2016, Task 4
Not
going
to
the
beach
tomorrow
:-(
convolutional layer with
multiple filters
Multilayer
perceptron
with dropout
embeddings
for each word
max over time
pooling
Convolutional Neural Network

Training the network
Plain Word Embeddings
 Word2vec on 167
million Italian tweets
 Parameters:
 embeddings size 300
 window dimension 5
 discarding words
with freq < 5
 450k word
embeddings
obtained
Sentiment Specific WE
 Starting from plain
WE
 Sentiment polarity
of texts into the
embeddings
 Positive and
Negative tweets
based on emoticons
 More negative tweets
than positive tweets

Distant Supervision
 Silver corpus created as follows:
 Randomly choose max 10k tweets per class
(mixed and neutral added)
 Select tweets which are assigned same class by:
1. emoticon presence (RE match)
2. classifier trained using the task trainset (gold).

Experiments
 Extensive experiments with various
configurations of the classifier:
 filters
 plain or sentiment specific word embeddings
 gold or silver training set.
 Best settings:
Run 1 Run 2
Embeddings WE skipgram SWE
Training set Gold Silver Gold Silver
Filters 2, 3, 5 4, 5, 6, 7 7, 7, 7, 7, 8, 8, 8, 8 7, 8, 9, 10

Results
 Top official results for polarity classification
 The extended silver corpus did not help, possibly
because the resulting corpus was still
unbalanced.
System
Positive
F-score
Negative
F-score
Combined F-
score
UniPI_2.c 0.685 0.6426 0.6638
team1_1.u 0.6354 0.6885 0.662
team1_2.u 0.6312 0.6838 0.6575
team4_.c 0.644 0.6605 0.6522
team3_.1.c 0.6265 0.6743 0.6504
team5_2.c 0.6426 0.648 0.6453
team3_.2.c 0.6395 0.6469 0.6432
UniPI_1.u 0.6699 0.6146 0.6422
UniPI_1.c 0.6766 0.6002 0.6384
UniPI_2.u 0.6586 0.5654 0.612

New Results
Unipi_2c Positive Negative F-score
official run 0.685 0.6426 0.6638
plain embeddings 0.6851 0.6612 0.6731
SE 200k tweets 25 epochs 0.6779 0.6826 0.6803
SE 500k tweets 4 epochs 0.6818 0.6856 0.6837

Conclusions
 The experiments confirmed the validity of the
Convolutional Neural Networks in Twitter
sentiment classification, also for the Italian
language.
 Sentiment Embeddings proved to be effective
for sentiment classification

Viewers also liked

word embeddings and applications to machine translation and sentiment analysisMostapha Benhenda

Distributed representation of sentences and documentsAbdullah Khan Zehady

Word2vec 4 allÓscar García Peinado

Practical Sentiment AnalysisPeople Pattern

Deep learning for natural language embeddingsRoelof Pieters

Machine Learning From Movie Reviews - Long FormJennifer Dunne

What is word2vec?Traian Rebedea

Tutorial of Sentiment AnalysisFabio Benedetti

Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan

Distributed Representations of Sentences and Documentssakaizawa

Viewers also liked (10)

word embeddings and applications to machine translation and sentiment analysis

Distributed representation of sentences and documents

Word2vec 4 all

Practical Sentiment Analysis

Deep learning for natural language embeddings

Machine Learning From Movie Reviews - Long Form

What is word2vec?

Tutorial of Sentiment Analysis

Opinion Mining Tutorial (Sentiment Analysis)

Distributed Representations of Sentences and Documents

Similar to CNN Sentiment Analysis Italian Tweets

Semantic Analysis to Compute Personality Traits from Social Media PostsGiulio Carducci

B4UConference_machine learning_deeplearningHoa Le

IA3_presentation.pptxKtonNguyn2

SophiaConf 2018 - P. Urso (Activeeon)TelecomValley

AlphaZero and beyond: PolygamesOlivier Teytaud

Scalawox deeplearningscalawox

Discover deep insights with Salesforce Einstein Analytics and DiscoveryNew Delhi Salesforce Developer Group

It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues

[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會

Triantafyllia VoulibasiISSEL

Usability lab design proposal Kent State University

Future of Xiaomi in Indian MarketSoochna Sahu

Thinking in software testingสาโรจน์ แสงผ่องอำไพ

AIRS2016Tetsuya Sakai

Deep learning Malaysia presentation 12/4/2017Brian Ho

resumelrs_jan_2017Laird Snowden

TDD and Getting PaidRowan Merewood

Protein Structure AlignmentNasir Mahmood, PhD

Evaluation of the suitability of people services for performing delphi studiesJohannes K

Agile analysis developmentsetitesuk

Similar to CNN Sentiment Analysis Italian Tweets (20)

Semantic Analysis to Compute Personality Traits from Social Media Posts

B4UConference_machine learning_deeplearning

IA3_presentation.pptx

SophiaConf 2018 - P. Urso (Activeeon)

AlphaZero and beyond: Polygames

Scalawox deeplearning

Discover deep insights with Salesforce Einstein Analytics and Discovery

It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair

[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人

Triantafyllia Voulibasi

Usability lab design proposal

Future of Xiaomi in Indian Market

Thinking in software testing

AIRS2016

Deep learning Malaysia presentation 12/4/2017

resumelrs_jan_2017

TDD and Getting Paid

Protein Structure Alignment

Evaluation of the suitability of people services for performing delphi studies

Agile analysis development

Recently uploaded

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

Halmar dropshipping via API with DroFxolyaivanovalion

Week-01-2.ppt BBB human Computer interactionfulawalesam

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Industrialised data - the key to AI success.pdfLars Albertsson

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

Carero dropshipping via API with DroFx.pptxolyaivanovalion

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Ravak dropshipping via API with DroFx.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

Halmar dropshipping via API with DroFx

Week-01-2.ppt BBB human Computer interaction

E-Commerce Order PredictionShraddha Kamble.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Industrialised data - the key to AI success.pdf

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Call Girls In Mahipalpur O9654467111 Escorts Service

Carero dropshipping via API with DroFx.pptx

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

VidaXL dropshipping via API with DroFx.pptx

Invezz.com - Grow your wealth with trading signals

CNN Sentiment Analysis Italian Tweets

1. Convolutional Neural Networks for Sentiment Analysis on Italian Tweets Giuseppe Attardi, Daniele Sartiano, Chiara Alzetta, Federica Semplici Dipartimento di Informatica Università di Pisa Università di Pisa

2. Task 2. Polarity Classification G. Attardi, D. Sartiano (2016) SemEval 2016, Task 4 Not going to the beach tomorrow :-( convolutional layer with multiple filters Multilayer perceptron with dropout embeddings for each word max over time pooling Convolutional Neural Network

3. Training the network Plain Word Embeddings  Word2vec on 167 million Italian tweets  Parameters:  embeddings size 300  window dimension 5  discarding words with freq < 5  450k word embeddings obtained Sentiment Specific WE  Starting from plain WE  Sentiment polarity of texts into the embeddings  Positive and Negative tweets based on emoticons  More negative tweets than positive tweets

4. Distant Supervision  Silver corpus created as follows:  Randomly choose max 10k tweets per class (mixed and neutral added)  Select tweets which are assigned same class by: 1. emoticon presence (RE match) 2. classifier trained using the task trainset (gold).

5. Experiments  Extensive experiments with various configurations of the classifier:  filters  plain or sentiment specific word embeddings  gold or silver training set.  Best settings: Run 1 Run 2 Embeddings WE skipgram SWE Training set Gold Silver Gold Silver Filters 2, 3, 5 4, 5, 6, 7 7, 7, 7, 7, 8, 8, 8, 8 7, 8, 9, 10

6. Results  Top official results for polarity classification  The extended silver corpus did not help, possibly because the resulting corpus was still unbalanced. System Positive F-score Negative F-score Combined F- score UniPI_2.c 0.685 0.6426 0.6638 team1_1.u 0.6354 0.6885 0.662 team1_2.u 0.6312 0.6838 0.6575 team4_.c 0.644 0.6605 0.6522 team3_.1.c 0.6265 0.6743 0.6504 team5_2.c 0.6426 0.648 0.6453 team3_.2.c 0.6395 0.6469 0.6432 UniPI_1.u 0.6699 0.6146 0.6422 UniPI_1.c 0.6766 0.6002 0.6384 UniPI_2.u 0.6586 0.5654 0.612

7. New Results Unipi_2c Positive Negative F-score official run 0.685 0.6426 0.6638 plain embeddings 0.6851 0.6612 0.6731 SE 200k tweets 25 epochs 0.6779 0.6826 0.6803 SE 500k tweets 4 epochs 0.6818 0.6856 0.6837

8. Conclusions  The experiments confirmed the validity of the Convolutional Neural Networks in Twitter sentiment classification, also for the Italian language.  Sentiment Embeddings proved to be effective for sentiment classification

Editor's Notes

We are going to talk about the results of the taks 2 for polarity classification We used a Deep Learning approach, i.e. a Convolutional Neural Network the same neural network was used for Semeval 2016 for english tweets We now try to use the same approach on italian tweets The architecture of the ConvNet is composed of 4 steps described in the picture
Architecture: the neural network is trained: Once with word embeddings (created with word2vec) Once with sentiment specific word embeddings. For both types we used a preprocessed text of tweets classic sentence splitting, tokenization and normalization of the elements not useful for the task, like URLs, mentions and numbers. The corpus is a collection of 167 Italian Tweets, enlarged also with other 1,3 million of tweets from Integris We created word embeddings with these parameters and we obtained 450k of them. Senti word Embeddings are word embeddings created from the same corpus, but now the sentences are labeled with polarity We defined the polarity of tweets, positive and negative, based on the emoticons that appear in the text of the tweet Some emoticons are clear, for the others we observed a sample of tweets where they appear and took the decision. Positive tweets are much more frequent than negative tweets. Since 1. we notice that the polarity distribution of the gold training set is skewed too 2. that the training set is quite small, we created a silver corpus with distant supervision to add
To create the silver corpus we selected not more than 10,000 tweet from each class We first assigned the polarity to the tweets with regular expressions looking for the emoticons belonging to 4 classes this time We assigned 2 new labels (mixed and neutral) based on the annotation of the gold corpus The 2 new classes are added to the original corpus Than we took the same tweets and try to classify them using the classifier trained on the task trainset If the two techniques assign the same label (both positive, both negative..), than the tweet can be used for the silver corpus. As a result the silver corpus is still unbalanced, with very few ‘mixed’ examples
Anyway at this point we were able to run the classifier with many different configurations, varying different parameters. In the end we found that these in the table are the best settings.
In the end our approach obtained the best score for polarity classification. We also applied the same approach for subjectivity classification, without performing extensive experiments. We obtained again quite successful results, even though not the top score.
In conclusion we can say that our experiments confirm….

CNN Sentiment Analysis Italian Tweets

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to CNN Sentiment Analysis Italian Tweets

Similar to CNN Sentiment Analysis Italian Tweets (20)

Recently uploaded

Recently uploaded (20)

CNN Sentiment Analysis Italian Tweets

Editor's Notes