SlideShare a Scribd company logo
1 of 14
A Quality Type-aware Annotated Corpus and Lexicon for
Harassment Research
Presenter: Saeedeh Shekarpour
Authors: Mohammadreza Rezvan, Saeedeh
Shekarpour, Lakshika Balasuriya,Krishnaprasad
Thirunarayan, Valerie L. Shalin, Amit Sheth
1
Motivation
● Social media is a is being used extensively by people from various age groups
● Participants in such media experience insult, bullying, harassment and etc.
● 66% of internet users who have experience harassment on social media app (18% are serious)
● It poses challenges to social engagement and trust, resulting in emotional distress, privacy
concerns and threats to physical safety
2
Our Contribution (considering type/context)
3
Goal
developing a quality corpus which is type-aware
4
Steps
➔ Lexicon Creation
➔ Corpus Development
➔ Annotation
➔ Agreement Rate
5
Lexicon Creation
● We compiled our lexicon from online resources.
● Our lexicon is categorized into six types of harassment.
6
Category Count
Sexual 453
Appearance- Related 15
Intellectual 34
Racial 168
Political 23
Generic 44
Corpus Development
● We chose Twitter (313 million active users monthly posted over 500 million tweet per day)
● We utilized our lexicon as seed terms for collecting tweets from Twitter (December 18th, 2016
to January 10th 2017).
● we collected 10,000 tweets for each contextual type for a total of 50,000 tweets.
7
Annotation
● containing the profane words does not assure that the tweet is harassing
● Human judges annotated the corpus to discriminate harassing tweets from non-harassing
tweets and improve the corpus.
8
Contextual Type Annotated Tweets Yes No
Sexual 3,855 230 3,619
Racial 4,976 701 4,275
Appearance - Related 4,828 678 4,150
Intellectual 4,867 811 4,056
Political 5,663 699 4,964
Combined 24,189 3,119 21,070
Agreement Rate
● The corpus only contains tweets that receive at least two “yes” or two “no”
labels
● Cohen’s kappa coefficient measures the quality of our annotation
9
Harassment Type Agreement Rate
Sexual 0.70
Racial 0.84
Appearance-related 1.00
Intellectual 0.80
Political 0.69
Comparison
● Golbeck corpus: provides generic annotation, i.e., (i) harassing and (ii) non-harassing.
● This corpus contains 20,428 non-redundant annotated tweets of which only 5,277 are labeled
as harassing.
10
Contextual Type # of Tweets
Sexual 380
Racial 4,148
Appearance- related 145
Intellectual 381
Political 163
Non harassing 41
Total 5277
Conclusion and Future Research
● Developed a quality corpus for harassment research which considers type (contextual type)
● This dataset is publicly available at:
https://github.com/Mrezvan94/Harassment-Corpus
● Learning harassing language from the contextual types
● An schema for discriminating various definition of offensive language/hate speech
11
Acknowledgment
This project Supported by National Science Foundation (NSF) award CNS 1513721: Context-Aware
Harassment Detection on Social Media.
12
At last but not least
We wish a Web for
❖ TRUST
❖ SYMPATHY
❖ LOVE
13
14

More Related Content

Similar to A quality type aware annotated corpus and lexicon for harassment research

1 - Reality ConstructEach individual observes the world thro.docx
1 - Reality ConstructEach individual observes the world thro.docx1 - Reality ConstructEach individual observes the world thro.docx
1 - Reality ConstructEach individual observes the world thro.docxcroftsshanon
 
How Current is My Message About Ethics? (21 Question Quiz)
How Current is My Message About Ethics? (21 Question Quiz)How Current is My Message About Ethics? (21 Question Quiz)
How Current is My Message About Ethics? (21 Question Quiz)7Lenses
 
My friends muddy the waters: How a Statement of Principles became a Public Fi...
My friends muddy the waters: How a Statement of Principles became a Public Fi...My friends muddy the waters: How a Statement of Principles became a Public Fi...
My friends muddy the waters: How a Statement of Principles became a Public Fi...Omar Ha-Redeye
 
Build a Social Media Toolkit! Strategies for organisations to engage and opti...
Build a Social Media Toolkit! Strategies for organisations to engage and opti...Build a Social Media Toolkit! Strategies for organisations to engage and opti...
Build a Social Media Toolkit! Strategies for organisations to engage and opti...Health Evidence™
 
analyzing public sentiments using twitter feeds
 analyzing public sentiments using twitter feeds analyzing public sentiments using twitter feeds
analyzing public sentiments using twitter feedsOrakzay
 
For this assessment you will create an 8 slide PowerPoint presenta.docx
For this assessment you will create an 8 slide PowerPoint presenta.docxFor this assessment you will create an 8 slide PowerPoint presenta.docx
For this assessment you will create an 8 slide PowerPoint presenta.docxgreg1eden90113
 
Oe peer learning group 1 - session 4 - april 18
Oe peer learning   group 1 - session 4 - april 18 Oe peer learning   group 1 - session 4 - april 18
Oe peer learning group 1 - session 4 - april 18 Beth Kanter
 
Essay On Fifa World Cup Fever
Essay On Fifa World Cup FeverEssay On Fifa World Cup Fever
Essay On Fifa World Cup FeverChristy Williams
 
Executive Directors Chat Initiating Equity for Impact.pdf
Executive Directors Chat  Initiating Equity for Impact.pdfExecutive Directors Chat  Initiating Equity for Impact.pdf
Executive Directors Chat Initiating Equity for Impact.pdfTechSoup
 
[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...
[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...
[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...Ji Hyung Moon
 
Measuring people’s perceptions, evaluations and experiences: Why they matter ...
Measuring people’s perceptions, evaluations and experiences: Why they matter ...Measuring people’s perceptions, evaluations and experiences: Why they matter ...
Measuring people’s perceptions, evaluations and experiences: Why they matter ...StatsCommunications
 
iReflect Movement Social Media Strategy
iReflect Movement Social Media StrategyiReflect Movement Social Media Strategy
iReflect Movement Social Media StrategyIsmelda Alvarez
 
long version LGBTQIA Inclusive Practices
long version LGBTQIA Inclusive Practiceslong version LGBTQIA Inclusive Practices
long version LGBTQIA Inclusive PracticesChrista Spielman
 
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...
 Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer... Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...Spotle.ai
 
Racial justice and the climate movement
Racial justice and the climate movementRacial justice and the climate movement
Racial justice and the climate movementEPIPNational
 
U.S. Religious Landscape on Twitter
U.S. Religious Landscape on TwitterU.S. Religious Landscape on Twitter
U.S. Religious Landscape on TwitterLu Chen
 
Blogging and social media in Brussels for FutureLab Europe
Blogging and social media in Brussels for FutureLab EuropeBlogging and social media in Brussels for FutureLab Europe
Blogging and social media in Brussels for FutureLab EuropeJon Worth
 

Similar to A quality type aware annotated corpus and lexicon for harassment research (20)

1 - Reality ConstructEach individual observes the world thro.docx
1 - Reality ConstructEach individual observes the world thro.docx1 - Reality ConstructEach individual observes the world thro.docx
1 - Reality ConstructEach individual observes the world thro.docx
 
How Current is My Message About Ethics? (21 Question Quiz)
How Current is My Message About Ethics? (21 Question Quiz)How Current is My Message About Ethics? (21 Question Quiz)
How Current is My Message About Ethics? (21 Question Quiz)
 
My friends muddy the waters: How a Statement of Principles became a Public Fi...
My friends muddy the waters: How a Statement of Principles became a Public Fi...My friends muddy the waters: How a Statement of Principles became a Public Fi...
My friends muddy the waters: How a Statement of Principles became a Public Fi...
 
Build a Social Media Toolkit! Strategies for organisations to engage and opti...
Build a Social Media Toolkit! Strategies for organisations to engage and opti...Build a Social Media Toolkit! Strategies for organisations to engage and opti...
Build a Social Media Toolkit! Strategies for organisations to engage and opti...
 
analyzing public sentiments using twitter feeds
 analyzing public sentiments using twitter feeds analyzing public sentiments using twitter feeds
analyzing public sentiments using twitter feeds
 
For this assessment you will create an 8 slide PowerPoint presenta.docx
For this assessment you will create an 8 slide PowerPoint presenta.docxFor this assessment you will create an 8 slide PowerPoint presenta.docx
For this assessment you will create an 8 slide PowerPoint presenta.docx
 
Oe peer learning group 1 - session 4 - april 18
Oe peer learning   group 1 - session 4 - april 18 Oe peer learning   group 1 - session 4 - april 18
Oe peer learning group 1 - session 4 - april 18
 
Essay On Fifa World Cup Fever
Essay On Fifa World Cup FeverEssay On Fifa World Cup Fever
Essay On Fifa World Cup Fever
 
Executive Directors Chat Initiating Equity for Impact.pdf
Executive Directors Chat  Initiating Equity for Impact.pdfExecutive Directors Chat  Initiating Equity for Impact.pdf
Executive Directors Chat Initiating Equity for Impact.pdf
 
Norton Field Guide for Speaking 9.1
Norton Field Guide for Speaking 9.1Norton Field Guide for Speaking 9.1
Norton Field Guide for Speaking 9.1
 
[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...
[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...
[ACL][socialNLP][2020]BEEP_Korean_corpus_of_online_news_comments_for_toxic_sp...
 
Measuring people’s perceptions, evaluations and experiences: Why they matter ...
Measuring people’s perceptions, evaluations and experiences: Why they matter ...Measuring people’s perceptions, evaluations and experiences: Why they matter ...
Measuring people’s perceptions, evaluations and experiences: Why they matter ...
 
iReflect Movement Social Media Strategy
iReflect Movement Social Media StrategyiReflect Movement Social Media Strategy
iReflect Movement Social Media Strategy
 
long version LGBTQIA Inclusive Practices
long version LGBTQIA Inclusive Practiceslong version LGBTQIA Inclusive Practices
long version LGBTQIA Inclusive Practices
 
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...
 Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer... Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
Youth development
Youth developmentYouth development
Youth development
 
Racial justice and the climate movement
Racial justice and the climate movementRacial justice and the climate movement
Racial justice and the climate movement
 
U.S. Religious Landscape on Twitter
U.S. Religious Landscape on TwitterU.S. Religious Landscape on Twitter
U.S. Religious Landscape on Twitter
 
Blogging and social media in Brussels for FutureLab Europe
Blogging and social media in Brussels for FutureLab EuropeBlogging and social media in Brussels for FutureLab Europe
Blogging and social media in Brussels for FutureLab Europe
 

More from Saeedeh Shekarpour

Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
 
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on RelationsCEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on RelationsSaeedeh Shekarpour
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 

More from Saeedeh Shekarpour (7)

Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Metrics for Evaluating Quality of Embeddings for Ontological Concepts
Metrics for Evaluating Quality of Embeddings for Ontological Concepts
 
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on RelationsCEVO: Comprehensive EVent Ontology  Enhancing Cognitive Annotation on Relations
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on Relations
 
Windowing of attention
Windowing of attentionWindowing of attention
Windowing of attention
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Sina presentation in IBM
Sina presentation in IBMSina presentation in IBM
Sina presentation in IBM
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

A quality type aware annotated corpus and lexicon for harassment research

  • 1. A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research Presenter: Saeedeh Shekarpour Authors: Mohammadreza Rezvan, Saeedeh Shekarpour, Lakshika Balasuriya,Krishnaprasad Thirunarayan, Valerie L. Shalin, Amit Sheth 1
  • 2. Motivation ● Social media is a is being used extensively by people from various age groups ● Participants in such media experience insult, bullying, harassment and etc. ● 66% of internet users who have experience harassment on social media app (18% are serious) ● It poses challenges to social engagement and trust, resulting in emotional distress, privacy concerns and threats to physical safety 2
  • 4. Goal developing a quality corpus which is type-aware 4
  • 5. Steps ➔ Lexicon Creation ➔ Corpus Development ➔ Annotation ➔ Agreement Rate 5
  • 6. Lexicon Creation ● We compiled our lexicon from online resources. ● Our lexicon is categorized into six types of harassment. 6 Category Count Sexual 453 Appearance- Related 15 Intellectual 34 Racial 168 Political 23 Generic 44
  • 7. Corpus Development ● We chose Twitter (313 million active users monthly posted over 500 million tweet per day) ● We utilized our lexicon as seed terms for collecting tweets from Twitter (December 18th, 2016 to January 10th 2017). ● we collected 10,000 tweets for each contextual type for a total of 50,000 tweets. 7
  • 8. Annotation ● containing the profane words does not assure that the tweet is harassing ● Human judges annotated the corpus to discriminate harassing tweets from non-harassing tweets and improve the corpus. 8 Contextual Type Annotated Tweets Yes No Sexual 3,855 230 3,619 Racial 4,976 701 4,275 Appearance - Related 4,828 678 4,150 Intellectual 4,867 811 4,056 Political 5,663 699 4,964 Combined 24,189 3,119 21,070
  • 9. Agreement Rate ● The corpus only contains tweets that receive at least two “yes” or two “no” labels ● Cohen’s kappa coefficient measures the quality of our annotation 9 Harassment Type Agreement Rate Sexual 0.70 Racial 0.84 Appearance-related 1.00 Intellectual 0.80 Political 0.69
  • 10. Comparison ● Golbeck corpus: provides generic annotation, i.e., (i) harassing and (ii) non-harassing. ● This corpus contains 20,428 non-redundant annotated tweets of which only 5,277 are labeled as harassing. 10 Contextual Type # of Tweets Sexual 380 Racial 4,148 Appearance- related 145 Intellectual 381 Political 163 Non harassing 41 Total 5277
  • 11. Conclusion and Future Research ● Developed a quality corpus for harassment research which considers type (contextual type) ● This dataset is publicly available at: https://github.com/Mrezvan94/Harassment-Corpus ● Learning harassing language from the contextual types ● An schema for discriminating various definition of offensive language/hate speech 11
  • 12. Acknowledgment This project Supported by National Science Foundation (NSF) award CNS 1513721: Context-Aware Harassment Detection on Social Media. 12
  • 13. At last but not least We wish a Web for ❖ TRUST ❖ SYMPATHY ❖ LOVE 13
  • 14. 14

Editor's Notes

  1. The appearance-related context shows the highest agreement rate whereas the political and sexual contexts have the lowest indicating that these contents are more challenging to judge (ambiguity is higher).