SlideShare a Scribd company logo
1 of 47
Download to read offline
Igor Santos
Igor Miñambres-Marcos
Carlos Laorden
Patxi Galán-García
Aitor Santamaría-Ibirika
Pablo G. Bringas
Detecting spammer accounts
Content-based analysis
(TweetSpike) (Legitimate)
spam ham
t1
t2
t3
m1
m2
m10
m3
m9
m4
m7
m8
m5
m11
m6
legitimate
spam
legitimate
spam
testing
probability
Dynamic Markov Chain (DMC)
Prediction by Partial Match (PPM)
Classifier Acc. Sp Sr F-Measure AUC
Random Forest N=50 96.42 0.98 0.94 0.96 0.99
DMC without Adaptation 95.99 0.96 0.95 0.96 0.99
Random Forest N=10 95.96 0.97 0.94 0.95 0.99
PPM without Adaptation 94.80 0.97 0.91 0.94 0.99
Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98
Bayes K2 94.12 0.99 0.88 0.93 0.98
DMC with Adaptation 93.11 0.94 0.90 0.92 0.98
C4.5 95.79 0.98 0.92 0.95 0.97
KNN K=3 93.71 0.97 0.89 0.93 0.97
SVM PVK 95.81 0.97 0.93 0.95 0.96
PPM with Adaptation 76.50 0.78 0.69 0.72 0.86
Naive Bayes 72.72 0.64 0.89 0.75 0.76
A new and public dataset of twitter
spam to serve as evaluation
Adaptation of content-based
spam filtering to Twitter
A new compression-based text
filtering library for the ML tool WEKA
enhance this approach using social
network features
semantic capabilities by studying
the linguistic relationships
1. Follow me: http://files.twiyo-magazine.com/200000231-
1dfbb1ef57/follow-me-twitter.png
2. Twitter: http://www.redunonet.co/twitter.png
3. Twitter Infography: http://expandedramblings.com/index.php/march-
2013-by-the-numbers-a-few-amazing-twitter-stats
4. Twitter news: http://techtips.biz/wp-
content/uploads/sites/9/2013/07/twitter-news.jpg
5. Customer service: http://www.parature.com/wp-
content/uploads/2012/04/customerservice_twitter.jpg
6. MUSI Deusto: https://twitter.com/MUSIDeusto
7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock-
Gossiping-Women-Retro-Clip-A-17343494.jpg
8. Cyber-bullying:
http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber-
bullies.jpg
9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy-
bear-15726476.jpg
10. Spam bird: http://all4boys.ru/_pu/0/52734883.png
11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for-
transporting-drug-money-from-vegas/dollars/
12. Day 97: Infected by dustywrath:
http://www.flickr.com/photos/10921499@N07/2187318683
13. my bank sucks by B Rosen:
http://www.flickr.com/photos/rosengrant/3537904106/
14. Spam wall by freezelight:
http://www.flickr.com/photos/63056612@N00/155554663/
15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp-
content/uploads/2010/11/Bird-with-Boxing-Gloves.png
16. Twitter media: http://media.meltybuzz.fr/article-1440806-
ajust_930/media.jpg
17. Construction bird: http://i1-news.softpedia-
static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg
18. Bird in egg: http://needsomeonetoblog.com/wp-
content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg
19. Document folder:
http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202
662836172612
20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png
21. Bird in pole: http://www.microcenterblog.com/wp-
content/uploads/2013/01/Fake-or-Real-150x150.jpg
22. Bird screaming: http://www.bluewaterbrand.com/wp-
content/uploads/2013/04/168_2671597.jpg
23. Bird with sign: http://blog.retirementincomenetwork.com/wp-
content/uploads/2013/05/twitter-bird.jpg
24. Bird in lineup: http://sparkboutik.com/wp-
content/uploads/2012/01/twitterfauxpas.jpg

More Related Content

Viewers also liked (7)

Graph-based KNN Algorithm for Spam SMS Detection
Graph-based KNN Algorithm for Spam SMS DetectionGraph-based KNN Algorithm for Spam SMS Detection
Graph-based KNN Algorithm for Spam SMS Detection
 
Bulk sms
Bulk smsBulk sms
Bulk sms
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
Enhancing Twitter spam discovery using cross account pattern matching.
Enhancing Twitter spam discovery using cross account pattern matching.Enhancing Twitter spam discovery using cross account pattern matching.
Enhancing Twitter spam discovery using cross account pattern matching.
 
Spam
SpamSpam
Spam
 

More from Carlos Laorden

More from Carlos Laorden (10)

Collective Classification for Packed Executable Identification - CEAS 2011
Collective Classification for Packed Executable Identification - CEAS 2011Collective Classification for Packed Executable Identification - CEAS 2011
Collective Classification for Packed Executable Identification - CEAS 2011
 
Collective classification for unknown malware detection - SECRYPT 2011
Collective classification for unknown malware detection - SECRYPT 2011Collective classification for unknown malware detection - SECRYPT 2011
Collective classification for unknown malware detection - SECRYPT 2011
 
Puma: Permission Usage to detect Malware in Android - CISIS 2012
Puma: Permission Usage to detect Malware in Android - CISIS 2012Puma: Permission Usage to detect Malware in Android - CISIS 2012
Puma: Permission Usage to detect Malware in Android - CISIS 2012
 
A Threat Model Approach to Threats and Vulnerabilities in On-line Social Netw...
A Threat Model Approach to Threats and Vulnerabilities in On-line Social Netw...A Threat Model Approach to Threats and Vulnerabilities in On-line Social Netw...
A Threat Model Approach to Threats and Vulnerabilities in On-line Social Netw...
 
Negobot: A conversational agent based on game theory for the detection of pae...
Negobot: A conversational agent based on game theory for the detection of pae...Negobot: A conversational agent based on game theory for the detection of pae...
Negobot: A conversational agent based on game theory for the detection of pae...
 
Anomaly Detection using String Analysis for Android Malware Detection - CISIS...
Anomaly Detection using String Analysis for Android Malware Detection - CISIS...Anomaly Detection using String Analysis for Android Malware Detection - CISIS...
Anomaly Detection using String Analysis for Android Malware Detection - CISIS...
 
Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
Enhancing Scalability in Anomaly-based Email Spam Filtering - CEAS 2011
 
Collective Classification for Spam Filtering - CISIS 2011
Collective Classification for Spam Filtering - CISIS 2011Collective Classification for Spam Filtering - CISIS 2011
Collective Classification for Spam Filtering - CISIS 2011
 
On the Study of Anomaly-based Spam Filtering Using Spam as Representation of ...
On the Study of Anomaly-based Spam Filtering Using Spam as Representation of ...On the Study of Anomaly-based Spam Filtering Using Spam as Representation of ...
On the Study of Anomaly-based Spam Filtering Using Spam as Representation of ...
 
Anomaly-based Spam Filtering - SECRYPT 2011
Anomaly-based Spam Filtering - SECRYPT 2011Anomaly-based Spam Filtering - SECRYPT 2011
Anomaly-based Spam Filtering - SECRYPT 2011
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Twitter Content-based Spam Filtering - CISIS 2013

  • 1. Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 28.
  • 29.
  • 31.
  • 33.
  • 34.
  • 37. Dynamic Markov Chain (DMC) Prediction by Partial Match (PPM)
  • 38.
  • 39. Classifier Acc. Sp Sr F-Measure AUC Random Forest N=50 96.42 0.98 0.94 0.96 0.99 DMC without Adaptation 95.99 0.96 0.95 0.96 0.99 Random Forest N=10 95.96 0.97 0.94 0.95 0.99 PPM without Adaptation 94.80 0.97 0.91 0.94 0.99 Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98 Bayes K2 94.12 0.99 0.88 0.93 0.98 DMC with Adaptation 93.11 0.94 0.90 0.92 0.98 C4.5 95.79 0.98 0.92 0.95 0.97 KNN K=3 93.71 0.97 0.89 0.93 0.97 SVM PVK 95.81 0.97 0.93 0.95 0.96 PPM with Adaptation 76.50 0.78 0.69 0.72 0.86 Naive Bayes 72.72 0.64 0.89 0.75 0.76
  • 40.
  • 41. A new and public dataset of twitter spam to serve as evaluation Adaptation of content-based spam filtering to Twitter A new compression-based text filtering library for the ML tool WEKA
  • 42. enhance this approach using social network features semantic capabilities by studying the linguistic relationships
  • 43.
  • 44.
  • 45. 1. Follow me: http://files.twiyo-magazine.com/200000231- 1dfbb1ef57/follow-me-twitter.png 2. Twitter: http://www.redunonet.co/twitter.png 3. Twitter Infography: http://expandedramblings.com/index.php/march- 2013-by-the-numbers-a-few-amazing-twitter-stats 4. Twitter news: http://techtips.biz/wp- content/uploads/sites/9/2013/07/twitter-news.jpg 5. Customer service: http://www.parature.com/wp- content/uploads/2012/04/customerservice_twitter.jpg 6. MUSI Deusto: https://twitter.com/MUSIDeusto 7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock- Gossiping-Women-Retro-Clip-A-17343494.jpg 8. Cyber-bullying: http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber- bullies.jpg 9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy- bear-15726476.jpg
  • 46. 10. Spam bird: http://all4boys.ru/_pu/0/52734883.png 11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for- transporting-drug-money-from-vegas/dollars/ 12. Day 97: Infected by dustywrath: http://www.flickr.com/photos/10921499@N07/2187318683 13. my bank sucks by B Rosen: http://www.flickr.com/photos/rosengrant/3537904106/ 14. Spam wall by freezelight: http://www.flickr.com/photos/63056612@N00/155554663/ 15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp- content/uploads/2010/11/Bird-with-Boxing-Gloves.png 16. Twitter media: http://media.meltybuzz.fr/article-1440806- ajust_930/media.jpg 17. Construction bird: http://i1-news.softpedia- static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg 18. Bird in egg: http://needsomeonetoblog.com/wp- content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg
  • 47. 19. Document folder: http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202 662836172612 20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png 21. Bird in pole: http://www.microcenterblog.com/wp- content/uploads/2013/01/Fake-or-Real-150x150.jpg 22. Bird screaming: http://www.bluewaterbrand.com/wp- content/uploads/2013/04/168_2671597.jpg 23. Bird with sign: http://blog.retirementincomenetwork.com/wp- content/uploads/2013/05/twitter-bird.jpg 24. Bird in lineup: http://sparkboutik.com/wp- content/uploads/2012/01/twitterfauxpas.jpg