SlideShare a Scribd company logo
1 of 20
Authors
University
Politehnica
of Bucharest
Relevance-Based Ranking
of Video Comments on YouTube
Andrei Șerbănoiu
Traian Rebedea traian.rebedea@cs.pub.ro
Overview
• Introduction
• Motivation
• System architecture
• Classification of relevant comments
• Ranking of relevant comments
• Results
• Conclusions
22.09.13 Sesiunea de Licenţe - Iulie 2012 2
Introduction
• Text classification and raking for comments on
YouTube videos
– First: classification whether the comment is relevant or
not for the given video file
– Second: ranking the relevant comments
• Focus on identifying relevant information
• Comments have a very small number of words –
sometimes less than 10, on average of the order of
tens
• Relevance is evaluated with respect to the information
collected from other online sources about the video
22.09.13 CSCS 2013 – Bucharest, Romania 3
Existing research
• We have not been able to identify any previous
research in the direction of identifying relevant
comments
• YouTube research
– Identify the relevant features of community acceptance
(comments with many “likes”)
– Extract the sentiment orientation
– Differentiate between clean and noisy comments
• Other research
– Ranking Comments on the Social Web (uses Digg)
22.09.13 CSCS 2013 – Bucharest, Romania 4
Motivation
• The Police – Every Breath You Take
22.09.13 CSCS 2013 – Bucharest, Romania 5
Motivation
• Most commented video
• “10 questions that every intelligent Christian
must answer”
• 1,429,425 comments on 30th
May 2013 (early
morning)
• How many of these comments are spam?
• Which ones would be most relevant to the
video?
22.09.13 CSCS 2013 – Bucharest, Romania 6
Solution
• Ranking of the comments according to relevance
• Steps:
1. Automatically link video with other online sources
relevant to it
2. Filter comments to remove noisy comments
3. Rank the remaining comments according to
relevance computed using NLP techniques
• Our solution works for music videos
22.09.13 CSCS 2013 – Bucharest, Romania 7
System architecture
22.09.13 CSCS 2013 – Bucharest, Romania 8
Processing pipeline
comments = fetchYouTubeComments();
comments = filterComments(comments);
commentTopics=createCommentTopics(comments)
resources = getResources(wikipedia,allmusic,lyrics);
for(int i=0;i<commentTopics.length;i++)
{
computeRelevance(commentTopics[i],
resources);
}
22.09.13 CSCS 2013 – Bucharest, Romania 9
Preprocessing
• Comments retrieved with YouTube Data API
– Only used last 100 comments per video
• Filter comments not written in English using
JLangDetect
• Extracted the main topics for each comment
using Mallet => 5 topics per comment
• Expanding the topics with synonyms and
hypernyms from WordNet
22.09.13 CSCS 2013 – Bucharest, Romania 10
Pre-classification of comments
• Objective: to reduce the number of comments considered for
ranking by identifying noise
• Classification based on a neural network by using a set of
simple linguistic features
• Multilayered Perceptron implemented in Weka
• Features
– Number of non-ASCII characters
– Number of capital letters
– Number of newlines
– Number of digits
– Number of trivial and swear-words
– Number of words in comment
– Average word size
– Number of punctuation marks
– Common text spam count
22.09.13 CSCS 2013 – Bucharest, Romania 11
Pre-classification of comments
• Trained on a small corpus with 100 relevant
comments and 100 noisy comments
• Examples of noisy comments:
– "Step 1: Pause this video
Step 2: Google 'Rainymood'
Step 3: Click the first link
Step 4: Unpause this video
Step 5: Thumbs? up this comment, enjoy and thank
me later"
– "Those 3,175 haters listen to? 'Techno'. “
– " IF YOU LIKE DIRTY DIANA SONG THE SINGER '' STEFANO GIORGINI '' DID A GREAT? REMAKE
STEFANO IS A VERY GOOD SINGER SONGWRITER I THINK YOU WILL LIKE HIS VERSION JUST LOOK FOR
'' STEFANO GIORGINI '' DIRTY DIANA" "
22.09.13 CSCS 2013 – Bucharest, Romania 12
Pre-classification of comments
• Results of pre-classification stage
22.09.13 CSCS 2013 – Bucharest, Romania 13
Type of Instances
No.
Instances
%
Correctly Classified Instances 174 87.46
Incorrectly Classified Instances 26 12.54
Total Number of Instances 200 -
Relevance scoring stage
• Initial approach
• Extract topics from comments as previously
mentioned (Mallet + WordNet)
• Fetch Wikipedia articles for artist and song name
• Score computed based in number of appearances
of the topics from the comments in the articles
22.09.13 CSCS 2013 – Bucharest, Romania 14
Relevance scoring stage
• Second approach: topic-based scoring
• Similar to the previous one, but topics are also
extracted from the Wikipedia articles with
Mallet
• Scoring is done based on:
– Number of topics extracted from each comment
– Wikipedia topic matches for each comment
22.09.13 CSCS 2013 – Bucharest, Romania 15
Relevance scoring stage
• Third approach
• Multiple-source topic-based scoring
• Additional source added to the Wikipedia articles
– Information from allmusic.com website on artists and
songs
– Information from song lyrics
• Topics matched between comments and Wikipedia +
Allmusic articles, plus exact match of lyrics
• Final relevance score is a weighted sum of the previous
factors
22.09.13 CSCS 2013 – Bucharest, Romania 16
Results
22.09.13 CSCS 2013 – Bucharest, Romania 17
Comment Relevance
maybe your friend should know that being english, have a picture in abbey road
and "sing" all you need is love" won't make one direction? a group like the
beatles...
662
my mom said she doesn't like the beatles and she said that john was only good
to look at? not to hear. my dad said, " haha so true!." i'm an orphan now.
968
you shouldn't be listening to the beatles since these seem to turn your friends
into enemies! beatles are all about peace!? you are not getting their message!
983
please read this ! hey i know u just wanna listen to the song but i still have to
write this hoping someone will see it and that someone will care .i'm a? young
musician from croatia so this spam is my only chance to get noticed.please check
out my channel and i promise u won't be sorry.i appreciate your time because
music means everything to me, thank you! ?
1309
i didn't mean fight other places. i meant focus on the hurt people in your own
country first, then expand to the others. if people don't agree with peace that's
an opinion. not a fact, and people often take offense to opinions. there isn't?
anything to take offense to, they say something that's all it is. they said it, don't
put meaning to it. world peace - i meant the whole world having peace there
1639
Results
• Difficult to assess whether the impact of the
relevance measure
• Interpreting the comments is subjective
– Need human annotators
• The order of the comments is completely
different from the one presented now on
YouTube (correlation lower than 0.031 for the
first 100 comments)
• Method 1 is also not correlated with the other
two methods
• Methods 2 and 3 have a higher correlation: 0.124
22.09.13 CSCS 2013 – Bucharest, Romania 18
Conclusions
• 2-stage method for ranking comments on YouTube
• The first stage removes noisy comments
• The second stage tries to link the comments with
information from other web pages relevant for the
video
• Relevance is computed based on topic-modeling with
Mallet
•
• Results are encouraging, but need to find a more
rigorous method of assessing them
• Results are better than the usual results provided by
YouTube, however the processing time for each video
should not be neglected
22.09.13 CSCS 2013 – Bucharest, Romania 19
Thank you!
• Questions?
• Discussion
22.09.13 CSCS 2013 – Bucharest, Romania 20

More Related Content

What's hot

HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
WarNik Chow
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
TELKOMNIKA JOURNAL
 
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Traian Rebedea
 

What's hot (20)

Using lexical chains for text summarization
Using lexical chains for text summarizationUsing lexical chains for text summarization
Using lexical chains for text summarization
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Research on AITV
Research on AITVResearch on AITV
Research on AITV
 
Link prediction
Link predictionLink prediction
Link prediction
 
Its2016 presentations-stefan-trausan
Its2016 presentations-stefan-trausanIts2016 presentations-stefan-trausan
Its2016 presentations-stefan-trausan
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Learning Probabilistic Relational Models
Learning Probabilistic Relational ModelsLearning Probabilistic Relational Models
Learning Probabilistic Relational Models
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian Networks
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
 
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
 

Viewers also liked

Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Traian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Traian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
Traian Rebedea
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
Traian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
Traian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
Traian Rebedea
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9
Traian Rebedea
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8
Traian Rebedea
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
Traian Rebedea
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
Traian Rebedea
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12
Traian Rebedea
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3
Traian Rebedea
 

Viewers also liked (19)

Software Services in Romania – Academia and Industry
Software Services in Romania – Academia and IndustrySoftware Services in Romania – Academia and Industry
Software Services in Romania – Academia and Industry
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 

Similar to Relevance based ranking of video comments on YouTube

The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...
Dana Rotman
 

Similar to Relevance based ranking of video comments on YouTube (8)

Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
 
The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about MusicCrowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
 
Recommender Systems Explained
Recommender Systems ExplainedRecommender Systems Explained
Recommender Systems Explained
 
Movies recommendation system in R Studio, Machine learning
Movies recommendation system in  R Studio, Machine learning Movies recommendation system in  R Studio, Machine learning
Movies recommendation system in R Studio, Machine learning
 
Automated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a  MOOC Discussion ForumAutomated Evaluation of Comments in a  MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion Forum
 
Automated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion ForumAutomated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion Forum
 

More from Traian Rebedea

Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6
Traian Rebedea
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
Traian Rebedea
 

More from Traian Rebedea (6)

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discovery
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
 
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingAlgorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Relevance based ranking of video comments on YouTube

  • 1. Authors University Politehnica of Bucharest Relevance-Based Ranking of Video Comments on YouTube Andrei Șerbănoiu Traian Rebedea traian.rebedea@cs.pub.ro
  • 2. Overview • Introduction • Motivation • System architecture • Classification of relevant comments • Ranking of relevant comments • Results • Conclusions 22.09.13 Sesiunea de Licenţe - Iulie 2012 2
  • 3. Introduction • Text classification and raking for comments on YouTube videos – First: classification whether the comment is relevant or not for the given video file – Second: ranking the relevant comments • Focus on identifying relevant information • Comments have a very small number of words – sometimes less than 10, on average of the order of tens • Relevance is evaluated with respect to the information collected from other online sources about the video 22.09.13 CSCS 2013 – Bucharest, Romania 3
  • 4. Existing research • We have not been able to identify any previous research in the direction of identifying relevant comments • YouTube research – Identify the relevant features of community acceptance (comments with many “likes”) – Extract the sentiment orientation – Differentiate between clean and noisy comments • Other research – Ranking Comments on the Social Web (uses Digg) 22.09.13 CSCS 2013 – Bucharest, Romania 4
  • 5. Motivation • The Police – Every Breath You Take 22.09.13 CSCS 2013 – Bucharest, Romania 5
  • 6. Motivation • Most commented video • “10 questions that every intelligent Christian must answer” • 1,429,425 comments on 30th May 2013 (early morning) • How many of these comments are spam? • Which ones would be most relevant to the video? 22.09.13 CSCS 2013 – Bucharest, Romania 6
  • 7. Solution • Ranking of the comments according to relevance • Steps: 1. Automatically link video with other online sources relevant to it 2. Filter comments to remove noisy comments 3. Rank the remaining comments according to relevance computed using NLP techniques • Our solution works for music videos 22.09.13 CSCS 2013 – Bucharest, Romania 7
  • 8. System architecture 22.09.13 CSCS 2013 – Bucharest, Romania 8
  • 9. Processing pipeline comments = fetchYouTubeComments(); comments = filterComments(comments); commentTopics=createCommentTopics(comments) resources = getResources(wikipedia,allmusic,lyrics); for(int i=0;i<commentTopics.length;i++) { computeRelevance(commentTopics[i], resources); } 22.09.13 CSCS 2013 – Bucharest, Romania 9
  • 10. Preprocessing • Comments retrieved with YouTube Data API – Only used last 100 comments per video • Filter comments not written in English using JLangDetect • Extracted the main topics for each comment using Mallet => 5 topics per comment • Expanding the topics with synonyms and hypernyms from WordNet 22.09.13 CSCS 2013 – Bucharest, Romania 10
  • 11. Pre-classification of comments • Objective: to reduce the number of comments considered for ranking by identifying noise • Classification based on a neural network by using a set of simple linguistic features • Multilayered Perceptron implemented in Weka • Features – Number of non-ASCII characters – Number of capital letters – Number of newlines – Number of digits – Number of trivial and swear-words – Number of words in comment – Average word size – Number of punctuation marks – Common text spam count 22.09.13 CSCS 2013 – Bucharest, Romania 11
  • 12. Pre-classification of comments • Trained on a small corpus with 100 relevant comments and 100 noisy comments • Examples of noisy comments: – "Step 1: Pause this video Step 2: Google 'Rainymood' Step 3: Click the first link Step 4: Unpause this video Step 5: Thumbs? up this comment, enjoy and thank me later" – "Those 3,175 haters listen to? 'Techno'. “ – " IF YOU LIKE DIRTY DIANA SONG THE SINGER '' STEFANO GIORGINI '' DID A GREAT? REMAKE STEFANO IS A VERY GOOD SINGER SONGWRITER I THINK YOU WILL LIKE HIS VERSION JUST LOOK FOR '' STEFANO GIORGINI '' DIRTY DIANA" " 22.09.13 CSCS 2013 – Bucharest, Romania 12
  • 13. Pre-classification of comments • Results of pre-classification stage 22.09.13 CSCS 2013 – Bucharest, Romania 13 Type of Instances No. Instances % Correctly Classified Instances 174 87.46 Incorrectly Classified Instances 26 12.54 Total Number of Instances 200 -
  • 14. Relevance scoring stage • Initial approach • Extract topics from comments as previously mentioned (Mallet + WordNet) • Fetch Wikipedia articles for artist and song name • Score computed based in number of appearances of the topics from the comments in the articles 22.09.13 CSCS 2013 – Bucharest, Romania 14
  • 15. Relevance scoring stage • Second approach: topic-based scoring • Similar to the previous one, but topics are also extracted from the Wikipedia articles with Mallet • Scoring is done based on: – Number of topics extracted from each comment – Wikipedia topic matches for each comment 22.09.13 CSCS 2013 – Bucharest, Romania 15
  • 16. Relevance scoring stage • Third approach • Multiple-source topic-based scoring • Additional source added to the Wikipedia articles – Information from allmusic.com website on artists and songs – Information from song lyrics • Topics matched between comments and Wikipedia + Allmusic articles, plus exact match of lyrics • Final relevance score is a weighted sum of the previous factors 22.09.13 CSCS 2013 – Bucharest, Romania 16
  • 17. Results 22.09.13 CSCS 2013 – Bucharest, Romania 17 Comment Relevance maybe your friend should know that being english, have a picture in abbey road and "sing" all you need is love" won't make one direction? a group like the beatles... 662 my mom said she doesn't like the beatles and she said that john was only good to look at? not to hear. my dad said, " haha so true!." i'm an orphan now. 968 you shouldn't be listening to the beatles since these seem to turn your friends into enemies! beatles are all about peace!? you are not getting their message! 983 please read this ! hey i know u just wanna listen to the song but i still have to write this hoping someone will see it and that someone will care .i'm a? young musician from croatia so this spam is my only chance to get noticed.please check out my channel and i promise u won't be sorry.i appreciate your time because music means everything to me, thank you! ? 1309 i didn't mean fight other places. i meant focus on the hurt people in your own country first, then expand to the others. if people don't agree with peace that's an opinion. not a fact, and people often take offense to opinions. there isn't? anything to take offense to, they say something that's all it is. they said it, don't put meaning to it. world peace - i meant the whole world having peace there 1639
  • 18. Results • Difficult to assess whether the impact of the relevance measure • Interpreting the comments is subjective – Need human annotators • The order of the comments is completely different from the one presented now on YouTube (correlation lower than 0.031 for the first 100 comments) • Method 1 is also not correlated with the other two methods • Methods 2 and 3 have a higher correlation: 0.124 22.09.13 CSCS 2013 – Bucharest, Romania 18
  • 19. Conclusions • 2-stage method for ranking comments on YouTube • The first stage removes noisy comments • The second stage tries to link the comments with information from other web pages relevant for the video • Relevance is computed based on topic-modeling with Mallet • • Results are encouraging, but need to find a more rigorous method of assessing them • Results are better than the usual results provided by YouTube, however the processing time for each video should not be neglected 22.09.13 CSCS 2013 – Bucharest, Romania 19
  • 20. Thank you! • Questions? • Discussion 22.09.13 CSCS 2013 – Bucharest, Romania 20