Quora ML Workshop: Content Moderation & Machine Learning

•

8 likes•614 views

Presentation by Alana Glassco, anti-abuse engineer at Smyte, at Quora ML Workshop: Protecting Online Spaces with Applied Machine Learning, on September 27, 2017.

Technology

Be Nice, Be Respectful:
Protecting Online Spaces with Applied
Machine Learning

Content Moderation &
Machine Learning
Common Pitfalls & How to Avoid Them
Alana Glassco
Anti-abuse Engineer at Smyte
Alana@smyte.com

Content Policies
● Context is key
● Not black & white
● Designed for humans, not machines

Understand the problem
● Business goals
● Nature of the problem
● Is ML a good fit?

For example...
● Business goals
○ Enforce company values
○ Gain good press
● Nature of the problem
○ Short-term
○ High FP cost
● Is ML a good fit?
○ No
● Business goals
○ Reduce bad press
○ Recover advertising loss
● Nature of the problem
○ Long-term
○ High FN cost
● Is ML a good fit?
○ Yes

Get the right training data
● Understand policies in practice
● “Free” data won’t cut it
● Invest in a human review team

Example: building a “spam” classifier
Repetitive
content
Keyword
stuffing
Artificial traffic Scams /
phishing
Behavioral
signals
Bots / fake
accounts
Real users
Bots / fake
accounts
Bots or real
users
Optics
Looks fine in
isolation
Easy to
identify
Invisible w/o
account
signals
Looks bad to
a trained
reviewer
Severity Harms
reputation
Harms search
results
Harms
ranking
Harms users

Design a solution
● Model selection
● Implementation
● Maintenance & retraining

Recently uploaded

Real Time Object Detection Using Open CVKhem

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Slack Application Development 101 Slidespraypatel2

A Year of the Servo Reboot: Where Are We Now?Igalia

🐬 The future of MySQL is Postgres 🐘RTylerCroy

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Recently uploaded (20)

Real Time Object Detection Using Open CV

Advantages of Hiring UIUX Design Service Providers for Your Business

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Finology Group – Insurtech Innovation Award 2024

Slack Application Development 101 Slides

A Year of the Servo Reboot: Where Are We Now?

🐬 The future of MySQL is Postgres 🐘

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Exploring the Future Potential of AI-Enabled Smartphone Processors

Handwritten Text Recognition for manuscripts and early printed texts

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

08448380779 Call Girls In Civil Lines Women Seeking Men

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

How to Troubleshoot Apps for the Modern Connected Worker

Featured

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Featured (20)

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Quora ML Workshop: Content Moderation & Machine Learning

1. Be Nice, Be Respectful: Protecting Online Spaces with Applied Machine Learning

3. Content Moderation & Machine Learning Common Pitfalls & How to Avoid Them Alana Glassco Anti-abuse Engineer at Smyte Alana@smyte.com

4. Content Policies ● Context is key ● Not black & white ● Designed for humans, not machines

5. Content moderation flow

6. Content moderation flow

7. Tips & tricks

8. Understand the problem ● Business goals ● Nature of the problem ● Is ML a good fit?

9. For example... ● Business goals ○ Enforce company values ○ Gain good press ● Nature of the problem ○ Short-term ○ High FP cost ● Is ML a good fit? ○ No ● Business goals ○ Reduce bad press ○ Recover advertising loss ● Nature of the problem ○ Long-term ○ High FN cost ● Is ML a good fit? ○ Yes

10. Get the right training data ● Understand policies in practice ● “Free” data won’t cut it ● Invest in a human review team

11. Example: building a “spam” classifier Repetitive content Keyword stuffing Artificial traffic Scams / phishing Behavioral signals Bots / fake accounts Real users Bots / fake accounts Bots or real users Optics Looks fine in isolation Easy to identify Invisible w/o account signals Looks bad to a trained reviewer Severity Harms reputation Harms search results Harms ranking Harms users

12. Design a solution ● Model selection ● Implementation ● Maintenance & retraining

13. Questions? alana@smyte.com

Quora ML Workshop: Content Moderation & Machine Learning

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Quora ML Workshop: Content Moderation & Machine Learning