SlideShare a Scribd company logo
1 of 23
Spot Deceptive TripAdvisor Hotel
Reviews
By: Yousef Fadila
Project Notebook:
https://github.com/yousef-fadila/cs548-project-5/blob/master/notebook.ipynb
CS548: Text Mining Project
Motivation - Fake reviews in the news
TripAdvisor warns of hotels posting fake reviews
http://abcnews.go.com/Technology/story?id=8094231
Twitter campaign takes aim at fake restaurant reviews on
TripAdvisor
https://www.theguardian.com/travel/2015/oct/24/twitter-campaign-targets-fake-tripadvisor-restaurant-reviews
Datasets
Deceptive Opinion Spam Corpus TripAdvisor Hotel-reviews
Consists of:
400 deceptive positive reviews
400 deceptive negative reviews
⇒ From Amazon Turks
400 truthful positive reviews
400 truthful negative reviews
⇒ From Trusted users in TripAdvisor
Consists of:
878561 reviews from 4333 hotels
crawled from TripAdvisor.
⇒ Includes meta-data. (hotel name,
rating, stars, location..)
Outline
Guiding Questions:
1. Which is more prevalent, positive deceptive or negative deceptive reviews among the
200,000 sample reviews?
2. What star-rating of hotels most commonly has deceptive reviews? Who are the top ten
hotels with deceptive positive reviews?
3. Is there enough support to claim that deceptive positive reviews are used to cover
previous negative reviews?
Extra:
1. Would a 2-step approach based on domain knowledge (like the one presented on
anomaly detection showcase) improve the accuracy of the text classification model?
2. Demo: Try it yourself.
3. Are computers better than Humans in detecting deceptive reviews?
Text Classification Model
1. (1,3) n_grams
2. min_df=3
3. max_df=0.96
4. LinearSVC classification.
Positive deceptive vs. negative deceptive ratio
1. Which is more prevalent, positive deceptive or negative deceptive reviews among
the 200,000 sample reviews?
Answer:
Positive deceptive reviews are more
prevalent.
Hotel Stars-Rating vs. Deceptive reviews rate
1. What star rating of hotels most commonly has deceptive reviews? who are the top
hotels according deceptive positive ratio reviews?
Top “deceptive” Hotels:
********Inn Houston
******** York Hotel
********ose Hotel
********a Inn Houston Wirt Road
********lmonico
Frequent Sequences Leads to Positive Deceptive
Reviews1. Pick up 20 hotels with deceptive reviews
2. Export all reviews of the selected hotels to arff file
3. Set sequence Id to hotel Id.
4. Run GSP algorithm in Weka.
2 Step Approach
1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly
detection showcase) improve the accuracy of the text classification model?
What features could be used
to distinguish deceptive from
truthful?
False Positive vs False Negative.
Supervised vs Unsupervised
Content Based Features
Some online reviews are too good to be true; Cornell computers spot 'opinion spam' http://bit.ly/2g6ou9X
"The researchers then applied computer analysis based on subtle features of text. Truthful hotel reviews, for
example, are more likely to use concrete words relating to the hotel, like "bathroom," "check-in" or "price."
Deceivers write more about things that set the scene, like "vacation," "business trip" or "my husband." Truth-
tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and
sometimes in features like the amount of punctuation or frequency of "large words." In parallel with previous
analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns."
Features to extract from the review text:
1)amount of punctuation
2)total nouns - total verbs
3)length of the review.
4)adjective and adverbs ratio
Unsupervised AD Followed by supervised classifier
No Improvement!
2nd Try: One Single Step Supervised Model
Merge both “bag of words” features and the content based extracted features
together for supervised classifier.
No Improvement!
3rd Try: Change Topology
2 supervised text
classification models.
Positive-negative
based only on “bag of words”.
Deceptive-truthful uses
both bag of words and
content based features.
3rd Try: Change Topology - Result
Overall
Improvement by 7%!
Demo: Try it yourself
www.yousef.fadila.net/cs548
REST API:
POST REQUEST to:
www.yousef.fadila.net/cs548/review_checker
Payload: {'review_text': text}
Sample response:{"result": "Likely Fake" }
Computers vs. Humans
Are computers better than Humans in detecting deceptive reviews?
Survey of WPI students
74 WPI students responded
Students were given 5 positive reviews and were asked to decide whether
they are truthful or deceptive reviews
The list intentionally includes reviews that weren’t classified correctly using
the model from 1st experiment
Computers vs. Humans
1 Computers Humans
1 1
Computers vs. Humans
1 Computers Humans
1 1
1 0
Computers vs. Humans
1 Computers Humans
1 1
1 0
0 0
Computers vs. Humans
1 Computers Humans
1 1
1 0
0 0
1 1
Computers vs. Humans
Computers Humans
1 1
1 0
0 0
1 1
1 1
Computers vs. Humans - Result
This is not a scientific study nor a
statistical one!
This is only a game! In fact it is unfair game as we use
reviews from the dataset we train the model on them!
The purpose of the game is to show if humans truth bias,
assuming that what they are reading is true until they find
evidence to the contrary, could affect their ability to spot
deceptive reviews.
Computers Humans
1 1
1 0
0 0
1 1
1 1
4 3
Any Questions?

More Related Content

Viewers also liked

Viewers also liked (17)

Innovative thinking التفكير الابداعي
Innovative thinking التفكير الابداعيInnovative thinking التفكير الابداعي
Innovative thinking التفكير الابداعي
 
Europe Language Jobs Annual Review 2016
Europe Language Jobs Annual Review 2016Europe Language Jobs Annual Review 2016
Europe Language Jobs Annual Review 2016
 
Trabajo
TrabajoTrabajo
Trabajo
 
Topología
TopologíaTopología
Topología
 
Mery sanchez....
Mery sanchez....Mery sanchez....
Mery sanchez....
 
Incapacitació i tutela i altres mesures legals
Incapacitació i tutela i altres mesures legalsIncapacitació i tutela i altres mesures legals
Incapacitació i tutela i altres mesures legals
 
Tercer indicador. michel y lina
Tercer indicador. michel y linaTercer indicador. michel y lina
Tercer indicador. michel y lina
 
Historia de roma
Historia de romaHistoria de roma
Historia de roma
 
R25798
R25798R25798
R25798
 
Reconocimiento general y de actores
Reconocimiento general y de actoresReconocimiento general y de actores
Reconocimiento general y de actores
 
Actividades para productos notables y factorizaciones induccion
Actividades para productos notables y factorizaciones induccionActividades para productos notables y factorizaciones induccion
Actividades para productos notables y factorizaciones induccion
 
Por la orda
Por la ordaPor la orda
Por la orda
 
Oa slide
Oa slideOa slide
Oa slide
 
Unidad 5 el univerrsomodificado (1)
Unidad 5 el univerrsomodificado (1)Unidad 5 el univerrsomodificado (1)
Unidad 5 el univerrsomodificado (1)
 
Matrixprop
MatrixpropMatrixprop
Matrixprop
 
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
 
דולפינריום מצגת חדשה
דולפינריום מצגת  חדשהדולפינריום מצגת  חדשה
דולפינריום מצגת חדשה
 

Similar to Detect Deceptive Hotel Reviews Using Text Classification

Fraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning TechniquesFraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning Techniquesijceronline
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Mainathiathi3
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET Journal
 
Yelp Product Challenge
Yelp Product ChallengeYelp Product Challenge
Yelp Product ChallengeHisham Radwan
 
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...IRJET Journal
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET Journal
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxridhimamittal3011
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET Journal
 
Marriott User Research Findings
Marriott User Research FindingsMarriott User Research Findings
Marriott User Research FindingsJonathan Coen
 
Collective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and MetadataCollective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and MetadataShebuti Rayana
 
A Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud DetectionA Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud DetectionIJMER
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET Journal
 
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of DeceptionEACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of DeceptionStephanie Steinhardt
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
 
The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...INFOGAIN PUBLICATION
 
Curbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp BehaviorsCurbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp BehaviorsMahmudur Rahman
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testingPhilip Johnson
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating DataIRJET Journal
 

Similar to Detect Deceptive Hotel Reviews Using Text Classification (20)

Fraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning TechniquesFraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning Techniques
 
Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Main
 
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
 
Yelp Product Challenge
Yelp Product ChallengeYelp Product Challenge
Yelp Product Challenge
 
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
A SUPERVISED MACHINE LEARNING APPROACH USING K-NEAREST NEIGHBOR ALGORITHM TO ...
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptx
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
 
Marriott User Research Findings
Marriott User Research FindingsMarriott User Research Findings
Marriott User Research Findings
 
nlp_finalpaper
nlp_finalpapernlp_finalpaper
nlp_finalpaper
 
Collective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and MetadataCollective Opinion Spam Detection Bridging Review Networks and Metadata
Collective Opinion Spam Detection Bridging Review Networks and Metadata
 
A Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud DetectionA Bayesian Probit Online Model Framework for Auction Fraud Detection
A Bayesian Probit Online Model Framework for Auction Fraud Detection
 
IRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion MiningIRJET- Fake Review Detection using Opinion Mining
IRJET- Fake Review Detection using Opinion Mining
 
EACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of DeceptionEACL2012: In Search of a Gold Standard in Studies of Deception
EACL2012: In Search of a Gold Standard in Studies of Deception
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
 
The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...The study of factors affecting customer’s satisfaction with the three star ho...
The study of factors affecting customer’s satisfaction with the three star ho...
 
Curbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp BehaviorsCurbing Deceptive Yelp Behaviors
Curbing Deceptive Yelp Behaviors
 
A beginners guide to testing
A beginners guide to testingA beginners guide to testing
A beginners guide to testing
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating Data
 

More from Yousef Fadila

Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaperYousef Fadila
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaYousef Fadila
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems - Yousef Fadila
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platformYousef Fadila
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithmYousef Fadila
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsYousef Fadila
 
Anomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you canAnomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you canYousef Fadila
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1Yousef Fadila
 

More from Yousef Fadila (9)

Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Synergy on the Blockchain! whitepaper
Synergy on the Blockchain!  whitepaperSynergy on the Blockchain!  whitepaper
Synergy on the Blockchain! whitepaper
 
Synergy Platform Whitepaper alpha
Synergy Platform Whitepaper alphaSynergy Platform Whitepaper alpha
Synergy Platform Whitepaper alpha
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platform
 
interactive voting based map matching algorithm
interactive voting based map matching algorithminteractive voting based map matching algorithm
interactive voting based map matching algorithm
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
Anomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you canAnomaly Detection - Catch me if you can
Anomaly Detection - Catch me if you can
 
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Detect Deceptive Hotel Reviews Using Text Classification

  • 1. Spot Deceptive TripAdvisor Hotel Reviews By: Yousef Fadila Project Notebook: https://github.com/yousef-fadila/cs548-project-5/blob/master/notebook.ipynb CS548: Text Mining Project
  • 2. Motivation - Fake reviews in the news TripAdvisor warns of hotels posting fake reviews http://abcnews.go.com/Technology/story?id=8094231 Twitter campaign takes aim at fake restaurant reviews on TripAdvisor https://www.theguardian.com/travel/2015/oct/24/twitter-campaign-targets-fake-tripadvisor-restaurant-reviews
  • 3. Datasets Deceptive Opinion Spam Corpus TripAdvisor Hotel-reviews Consists of: 400 deceptive positive reviews 400 deceptive negative reviews ⇒ From Amazon Turks 400 truthful positive reviews 400 truthful negative reviews ⇒ From Trusted users in TripAdvisor Consists of: 878561 reviews from 4333 hotels crawled from TripAdvisor. ⇒ Includes meta-data. (hotel name, rating, stars, location..)
  • 4. Outline Guiding Questions: 1. Which is more prevalent, positive deceptive or negative deceptive reviews among the 200,000 sample reviews? 2. What star-rating of hotels most commonly has deceptive reviews? Who are the top ten hotels with deceptive positive reviews? 3. Is there enough support to claim that deceptive positive reviews are used to cover previous negative reviews? Extra: 1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly detection showcase) improve the accuracy of the text classification model? 2. Demo: Try it yourself. 3. Are computers better than Humans in detecting deceptive reviews?
  • 5. Text Classification Model 1. (1,3) n_grams 2. min_df=3 3. max_df=0.96 4. LinearSVC classification.
  • 6. Positive deceptive vs. negative deceptive ratio 1. Which is more prevalent, positive deceptive or negative deceptive reviews among the 200,000 sample reviews? Answer: Positive deceptive reviews are more prevalent.
  • 7. Hotel Stars-Rating vs. Deceptive reviews rate 1. What star rating of hotels most commonly has deceptive reviews? who are the top hotels according deceptive positive ratio reviews? Top “deceptive” Hotels: ********Inn Houston ******** York Hotel ********ose Hotel ********a Inn Houston Wirt Road ********lmonico
  • 8. Frequent Sequences Leads to Positive Deceptive Reviews1. Pick up 20 hotels with deceptive reviews 2. Export all reviews of the selected hotels to arff file 3. Set sequence Id to hotel Id. 4. Run GSP algorithm in Weka.
  • 9. 2 Step Approach 1. Would a 2-step approach based on domain knowledge (like the one presented on anomaly detection showcase) improve the accuracy of the text classification model? What features could be used to distinguish deceptive from truthful? False Positive vs False Negative. Supervised vs Unsupervised
  • 10. Content Based Features Some online reviews are too good to be true; Cornell computers spot 'opinion spam' http://bit.ly/2g6ou9X "The researchers then applied computer analysis based on subtle features of text. Truthful hotel reviews, for example, are more likely to use concrete words relating to the hotel, like "bathroom," "check-in" or "price." Deceivers write more about things that set the scene, like "vacation," "business trip" or "my husband." Truth- tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and sometimes in features like the amount of punctuation or frequency of "large words." In parallel with previous analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns." Features to extract from the review text: 1)amount of punctuation 2)total nouns - total verbs 3)length of the review. 4)adjective and adverbs ratio
  • 11. Unsupervised AD Followed by supervised classifier No Improvement!
  • 12. 2nd Try: One Single Step Supervised Model Merge both “bag of words” features and the content based extracted features together for supervised classifier. No Improvement!
  • 13. 3rd Try: Change Topology 2 supervised text classification models. Positive-negative based only on “bag of words”. Deceptive-truthful uses both bag of words and content based features.
  • 14. 3rd Try: Change Topology - Result Overall Improvement by 7%!
  • 15. Demo: Try it yourself www.yousef.fadila.net/cs548 REST API: POST REQUEST to: www.yousef.fadila.net/cs548/review_checker Payload: {'review_text': text} Sample response:{"result": "Likely Fake" }
  • 16. Computers vs. Humans Are computers better than Humans in detecting deceptive reviews? Survey of WPI students 74 WPI students responded Students were given 5 positive reviews and were asked to decide whether they are truthful or deceptive reviews The list intentionally includes reviews that weren’t classified correctly using the model from 1st experiment
  • 17. Computers vs. Humans 1 Computers Humans 1 1
  • 18. Computers vs. Humans 1 Computers Humans 1 1 1 0
  • 19. Computers vs. Humans 1 Computers Humans 1 1 1 0 0 0
  • 20. Computers vs. Humans 1 Computers Humans 1 1 1 0 0 0 1 1
  • 21. Computers vs. Humans Computers Humans 1 1 1 0 0 0 1 1 1 1
  • 22. Computers vs. Humans - Result This is not a scientific study nor a statistical one! This is only a game! In fact it is unfair game as we use reviews from the dataset we train the model on them! The purpose of the game is to show if humans truth bias, assuming that what they are reading is true until they find evidence to the contrary, could affect their ability to spot deceptive reviews. Computers Humans 1 1 1 0 0 0 1 1 1 1 4 3