SlideShare a Scribd company logo
1 of 30
Download to read offline
Be Nice, Be Respectful:
Protecting Online Spaces with Applied
Machine Learning
Maintaining high quality user-generated
content through machine learning
Nikhil Dandekar
Quora: Nikhil-Dandekar
Twitter: @nikhilbd
Paula Griffin
Quora: Paula-Griffin-1
Twitter: @paulajgriffin
What is Quora?
Quora is a platform to ask
questions, get useful
answers, and share what
you know with the world.
Incredible answers from credible sources
Not everyone is Peter Norvig.
● Biggest challenges of any user-generated-content site are quality and moderation
● Two (mostly distinct) sets of users to deal with
○ Bad actors trying to cause harm
○ Well-meaning users who miss the mark
Bad actors
Well-meaning users
Growing challenges
● Millions of questions, answers, users, and topics
○ More incentives for bad actors
○ More users who aren’t familiar with Quora norms
● Without active effort, quality gets worse as we scale
● We need solutions that get better as our content grows
Solving these problems together
Writing the rulebook
● First step: deciding what you want on your platform
● “Be Nice, Be Respectful” policy since before our public launch in 2010
○ No hate speech
○ No harassment
○ No retaliation
● Almost all other policies flow from “being helpful” to someone viewing the page
○ Don’t write joke answers
○ Tag content with appropriate topics
Enforcing the rules
● Users can report content and users for violating Quora’s policies
● Starting out: manual review of all reports
● Problems:
○ Many man-hours needed to review all reports
○ Low reporting rates
○ The worst part: someone actually has to see the bad content
Enforcing the rules at scale
● Heuristics and machine learning help us reduce the burden of handling user reports, and
can proactively identify bad content
○ Deal with reported content faster and more cheaply
○ Catch spam, harassment, and other problems before other users see it
○ Automatically fix formatting and grammar in some cases
● Benefits of scale:
○ More content → more choice of good content
○ Ongoing feedback from human review systems
○ More data to train our models
Maintaining high content quality using
Machine Learning
ML Models for quality
● Questions: Adult detection, Question quality classification,
Duplicate questions detector, Overly personal question detector,
Question autocorrection etc.
● Answers + Comments: Adult detection, Answer ranking for
questions, Answer collapsing, BNBR classifier, Harassment classifier,
Spam classifier etc.
● Topics: Duplicate Topics detector, Bad Topic classifier etc.
● Users: Bad actor detection, Bad user-credentials classifier, Fake
name detection, User-topic bio classifier etc.
● Classifiers on other content types, e.g. answer wikis.
Machine Learning for quality: Overview
Machine Learning for quality: Overview
Algorithms
● RNNs (LSTMs/GRUs) and other deep networks,
Gradient Boosted Decision Trees, Random Forests,
Logistic Regression, LambdaMART, k-means and other
clustering techniques, k-NNs, PageRank etc.
Libraries
● Tensorflow, Keras, Sklearn, Xgboost, LightGBM,
FastText, RankLib, NTLK, spaCy etc.
Machine Learning model decision flow
Content
ML model
High-confidence
decision?
Take automatic action Ask a human to verify the action
NoYes
● Some examples of this decision flow:
○ Spam detection
○ BNBR violation detection
○ Question quality classifier
○ Duplicate question detection
○ ...and more
● The more nuanced and sensitive the decision, the
more the need for human verification
ML decision flow examples
Machine Learning data feedback loop
Training
data
Run model
on content
User actions
Human reviews
Train
Models
Case study: Question quality and automatic
question correction
● Users often ask questions with grammatical and spelling errors
● Example:
○ Which coin/token is next big thing in crypto currencies? And why?
○ Which coin/token is the next big thing in cryptocurrencies? Why?
● These are good questions, but the lack of correct phrasing hurts them
○ Less likely to be answered by experts
○ Harder to catch duplicate questions
○ Can hurt the perception of “quality” of Quora
“Bad” questions on Quora
“Bad” questions on Quora
● Types of errors in questions
○ Grammatical errors, e.g., “How I can ...”
○ Spelling mistakes
○ Missing preposition or article
○ Wrong/missing punctuation
○ Wrong capitalization
○ etc.
● Can we use Machine Learning to automatically correct these questions?
● Started off as an “offroad” hack-week project
● Since shipped
Automatic question correction: research
● Frame this problem similar to the machine translation
problem
● Final Model:
○ Sequence-to-sequence, character-level RNN (GRU)
with attention
Automatic question correction: Model
Automatic question correction: Model
● Model Details:
○ Sequence to sequence (encoder-decoder) model
○ Character-level
○ GRUs (Gated Recurrent Units)
○ Attention-based
○ Bidirectional
○ Beam search for decoding
● Tried solving the subproblems individually, but didn’t work as
well
● Training
○ Training data: Pairs of [bad question, corrected question]
○ Tensorflow, on a single box with GPUs
○ Training time: 2-3 hours
● Serving:
○ Tensorflow, GPU-based serving
○ Latency: <500 ms p99
● Run on new questions added to Quora
Automatic question correction: System Details
Automatic question correction: Results
● Checks for BNBR violations on questions, answers,
comments.
● Binary classifier
● Training data:
○ Positive: Confirmed BNBR violations
○ Negative: False BNBR reports, other good content
● Model: NN with 1 hidden layer (fastText)
● Same ML decision flow as before
BNBR classification
● Quality is one of the most important problems we face at Quora
● There are various systems to maintain quality, and we need to use all of them in order to keep up
● Machine Learning solutions helps us maintain quality at scale
○ ...but you can’t totally bypass human efforts
In conclusion
Quora ML Workshop: Maintaining High Quality User-Generated Content through Machine Learning

More Related Content

What's hot

The Future of Product Management by Product School Founder & CEO.pdf
The Future of Product Management by Product School Founder & CEO.pdfThe Future of Product Management by Product School Founder & CEO.pdf
The Future of Product Management by Product School Founder & CEO.pdfProduct School
 
How to Master Product-Led Growth Strategy in B2B by Gainsight CTO
How to Master Product-Led Growth Strategy in B2B by Gainsight CTOHow to Master Product-Led Growth Strategy in B2B by Gainsight CTO
How to Master Product-Led Growth Strategy in B2B by Gainsight CTOProduct School
 
Product Development & Data Storytelling by Pinterest Product Leader
Product Development & Data Storytelling by Pinterest Product LeaderProduct Development & Data Storytelling by Pinterest Product Leader
Product Development & Data Storytelling by Pinterest Product LeaderProduct School
 
How to Leverage Your Skill Set for Product by Google Product Manager
How to Leverage Your Skill Set for Product by Google Product ManagerHow to Leverage Your Skill Set for Product by Google Product Manager
How to Leverage Your Skill Set for Product by Google Product ManagerProduct School
 
How to Focus On the Problem, Not the Solution by Spotify PM
How to Focus On the Problem, Not the Solution by Spotify PMHow to Focus On the Problem, Not the Solution by Spotify PM
How to Focus On the Problem, Not the Solution by Spotify PMProduct School
 
Product Led Growth Strategy
Product Led Growth StrategyProduct Led Growth Strategy
Product Led Growth StrategyMickey Alon
 
How to Get Your CX and UX Teams Working Together
How to Get Your CX and UX Teams Working TogetherHow to Get Your CX and UX Teams Working Together
How to Get Your CX and UX Teams Working TogetherTandemSeven
 
Product Analytics Workshop
Product Analytics WorkshopProduct Analytics Workshop
Product Analytics WorkshopAmplitude
 
Social Media Process Flow
Social Media Process FlowSocial Media Process Flow
Social Media Process FlowAlex Conway
 
[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...
[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...
[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...Product Camp Brasil
 
30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study Guide30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study GuideLewis Lin 🦊
 
How to be a Successful Data PM by Zillow Product Leaders
How to be a Successful Data PM by Zillow Product LeadersHow to be a Successful Data PM by Zillow Product Leaders
How to be a Successful Data PM by Zillow Product LeadersProduct School
 
Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation
Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation  Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation
Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation Mathew Sweezey
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
 
Adobe analytics & Visitor cross-device identification
Adobe analytics & Visitor cross-device identificationAdobe analytics & Visitor cross-device identification
Adobe analytics & Visitor cross-device identificationKonstantinos Papadopoulos
 
Product Analytics 101 by Pendo VP of Products
Product Analytics 101 by Pendo VP of ProductsProduct Analytics 101 by Pendo VP of Products
Product Analytics 101 by Pendo VP of ProductsProduct School
 
How to Run A/B Tests Successfully by Vice Media Product Manager
How to Run A/B Tests Successfully by Vice Media Product ManagerHow to Run A/B Tests Successfully by Vice Media Product Manager
How to Run A/B Tests Successfully by Vice Media Product ManagerProduct School
 

What's hot (20)

The Future of Product Management by Product School Founder & CEO.pdf
The Future of Product Management by Product School Founder & CEO.pdfThe Future of Product Management by Product School Founder & CEO.pdf
The Future of Product Management by Product School Founder & CEO.pdf
 
How to Master Product-Led Growth Strategy in B2B by Gainsight CTO
How to Master Product-Led Growth Strategy in B2B by Gainsight CTOHow to Master Product-Led Growth Strategy in B2B by Gainsight CTO
How to Master Product-Led Growth Strategy in B2B by Gainsight CTO
 
Product Development & Data Storytelling by Pinterest Product Leader
Product Development & Data Storytelling by Pinterest Product LeaderProduct Development & Data Storytelling by Pinterest Product Leader
Product Development & Data Storytelling by Pinterest Product Leader
 
How to Leverage Your Skill Set for Product by Google Product Manager
How to Leverage Your Skill Set for Product by Google Product ManagerHow to Leverage Your Skill Set for Product by Google Product Manager
How to Leverage Your Skill Set for Product by Google Product Manager
 
Product-led growth
Product-led growthProduct-led growth
Product-led growth
 
How to Focus On the Problem, Not the Solution by Spotify PM
How to Focus On the Problem, Not the Solution by Spotify PMHow to Focus On the Problem, Not the Solution by Spotify PM
How to Focus On the Problem, Not the Solution by Spotify PM
 
Product Led Growth Strategy
Product Led Growth StrategyProduct Led Growth Strategy
Product Led Growth Strategy
 
How to Get Your CX and UX Teams Working Together
How to Get Your CX and UX Teams Working TogetherHow to Get Your CX and UX Teams Working Together
How to Get Your CX and UX Teams Working Together
 
Product Analytics Workshop
Product Analytics WorkshopProduct Analytics Workshop
Product Analytics Workshop
 
Social Media Process Flow
Social Media Process FlowSocial Media Process Flow
Social Media Process Flow
 
[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...
[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...
[Pcamp19] - Escalando o negócio: MVP deu certo, e agora? - Samia Lauar | Quin...
 
Digital Marketing Strategy by Huzefa Merchant-First Cry
Digital Marketing Strategy by Huzefa Merchant-First Cry Digital Marketing Strategy by Huzefa Merchant-First Cry
Digital Marketing Strategy by Huzefa Merchant-First Cry
 
30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study Guide30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study Guide
 
How to be a Successful Data PM by Zillow Product Leaders
How to be a Successful Data PM by Zillow Product LeadersHow to be a Successful Data PM by Zillow Product Leaders
How to be a Successful Data PM by Zillow Product Leaders
 
Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation
Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation  Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation
Deriving Intelligence from Customer Actions: Data Marketing 2015 presentation
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PM
 
Adobe analytics & Visitor cross-device identification
Adobe analytics & Visitor cross-device identificationAdobe analytics & Visitor cross-device identification
Adobe analytics & Visitor cross-device identification
 
Sp final demo
Sp final demoSp final demo
Sp final demo
 
Product Analytics 101 by Pendo VP of Products
Product Analytics 101 by Pendo VP of ProductsProduct Analytics 101 by Pendo VP of Products
Product Analytics 101 by Pendo VP of Products
 
How to Run A/B Tests Successfully by Vice Media Product Manager
How to Run A/B Tests Successfully by Vice Media Product ManagerHow to Run A/B Tests Successfully by Vice Media Product Manager
How to Run A/B Tests Successfully by Vice Media Product Manager
 

Similar to Quora ML Workshop: Maintaining High Quality User-Generated Content through Machine Learning

Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningNikhil Dandekar
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningVo Viet Anh
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleXavier Amatriain
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraNikhil Dandekar
 
Recommending the world's knowledge
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledgeLei Yang
 
Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)Nikhil Dandekar
 
Evan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATLEvan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATLMLconf
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
LibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW OshkoshLibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW OshkoshWiLS
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
 
How to become Industry ready engineers.pdf
How to become  Industry ready engineers.pdfHow to become  Industry ready engineers.pdf
How to become Industry ready engineers.pdfDrNilam Choudhary
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In IndustryXavier Amatriain
 
How to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of ProductHow to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of ProductProduct School
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Maurício Aniche
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubDevOpsDays Tel Aviv
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdfpreetikumara
 
User research independent study
User research independent studyUser research independent study
User research independent studyDr. V Vorvoreanu
 

Similar to Quora ML Workshop: Maintaining High Quality User-Generated Content through Machine Learning (20)

Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at Quora
 
Recommending the world's knowledge
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledge
 
Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)
 
Evan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATLEvan Estola – Data Scientist, Meetup.com at MLconf ATL
Evan Estola – Data Scientist, Meetup.com at MLconf ATL
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
LibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW OshkoshLibQual Challenges & Lessons Learned at UW Oshkosh
LibQual Challenges & Lessons Learned at UW Oshkosh
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
How to become Industry ready engineers.pdf
How to become  Industry ready engineers.pdfHow to become  Industry ready engineers.pdf
How to become Industry ready engineers.pdf
 
Recommender Systems In Industry
Recommender Systems In IndustryRecommender Systems In Industry
Recommender Systems In Industry
 
How to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of ProductHow to Succeed as a PM by Native Instruments fmr Dir of Product
How to Succeed as a PM by Native Instruments fmr Dir of Product
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdf
 
User research independent study
User research independent studyUser research independent study
User research independent study
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Quora ML Workshop: Maintaining High Quality User-Generated Content through Machine Learning

  • 1. Be Nice, Be Respectful: Protecting Online Spaces with Applied Machine Learning
  • 2.
  • 3. Maintaining high quality user-generated content through machine learning Nikhil Dandekar Quora: Nikhil-Dandekar Twitter: @nikhilbd Paula Griffin Quora: Paula-Griffin-1 Twitter: @paulajgriffin
  • 4. What is Quora? Quora is a platform to ask questions, get useful answers, and share what you know with the world.
  • 5. Incredible answers from credible sources
  • 6. Not everyone is Peter Norvig. ● Biggest challenges of any user-generated-content site are quality and moderation ● Two (mostly distinct) sets of users to deal with ○ Bad actors trying to cause harm ○ Well-meaning users who miss the mark
  • 9. Growing challenges ● Millions of questions, answers, users, and topics ○ More incentives for bad actors ○ More users who aren’t familiar with Quora norms ● Without active effort, quality gets worse as we scale ● We need solutions that get better as our content grows
  • 11. Writing the rulebook ● First step: deciding what you want on your platform ● “Be Nice, Be Respectful” policy since before our public launch in 2010 ○ No hate speech ○ No harassment ○ No retaliation ● Almost all other policies flow from “being helpful” to someone viewing the page ○ Don’t write joke answers ○ Tag content with appropriate topics
  • 12. Enforcing the rules ● Users can report content and users for violating Quora’s policies ● Starting out: manual review of all reports ● Problems: ○ Many man-hours needed to review all reports ○ Low reporting rates ○ The worst part: someone actually has to see the bad content
  • 13. Enforcing the rules at scale ● Heuristics and machine learning help us reduce the burden of handling user reports, and can proactively identify bad content ○ Deal with reported content faster and more cheaply ○ Catch spam, harassment, and other problems before other users see it ○ Automatically fix formatting and grammar in some cases ● Benefits of scale: ○ More content → more choice of good content ○ Ongoing feedback from human review systems ○ More data to train our models
  • 14. Maintaining high content quality using Machine Learning
  • 15. ML Models for quality ● Questions: Adult detection, Question quality classification, Duplicate questions detector, Overly personal question detector, Question autocorrection etc. ● Answers + Comments: Adult detection, Answer ranking for questions, Answer collapsing, BNBR classifier, Harassment classifier, Spam classifier etc. ● Topics: Duplicate Topics detector, Bad Topic classifier etc. ● Users: Bad actor detection, Bad user-credentials classifier, Fake name detection, User-topic bio classifier etc. ● Classifiers on other content types, e.g. answer wikis. Machine Learning for quality: Overview
  • 16. Machine Learning for quality: Overview Algorithms ● RNNs (LSTMs/GRUs) and other deep networks, Gradient Boosted Decision Trees, Random Forests, Logistic Regression, LambdaMART, k-means and other clustering techniques, k-NNs, PageRank etc. Libraries ● Tensorflow, Keras, Sklearn, Xgboost, LightGBM, FastText, RankLib, NTLK, spaCy etc.
  • 17. Machine Learning model decision flow Content ML model High-confidence decision? Take automatic action Ask a human to verify the action NoYes
  • 18. ● Some examples of this decision flow: ○ Spam detection ○ BNBR violation detection ○ Question quality classifier ○ Duplicate question detection ○ ...and more ● The more nuanced and sensitive the decision, the more the need for human verification ML decision flow examples
  • 19. Machine Learning data feedback loop Training data Run model on content User actions Human reviews Train Models
  • 20. Case study: Question quality and automatic question correction
  • 21. ● Users often ask questions with grammatical and spelling errors ● Example: ○ Which coin/token is next big thing in crypto currencies? And why? ○ Which coin/token is the next big thing in cryptocurrencies? Why? ● These are good questions, but the lack of correct phrasing hurts them ○ Less likely to be answered by experts ○ Harder to catch duplicate questions ○ Can hurt the perception of “quality” of Quora “Bad” questions on Quora
  • 22. “Bad” questions on Quora ● Types of errors in questions ○ Grammatical errors, e.g., “How I can ...” ○ Spelling mistakes ○ Missing preposition or article ○ Wrong/missing punctuation ○ Wrong capitalization ○ etc. ● Can we use Machine Learning to automatically correct these questions? ● Started off as an “offroad” hack-week project ● Since shipped
  • 24. ● Frame this problem similar to the machine translation problem ● Final Model: ○ Sequence-to-sequence, character-level RNN (GRU) with attention Automatic question correction: Model
  • 25. Automatic question correction: Model ● Model Details: ○ Sequence to sequence (encoder-decoder) model ○ Character-level ○ GRUs (Gated Recurrent Units) ○ Attention-based ○ Bidirectional ○ Beam search for decoding ● Tried solving the subproblems individually, but didn’t work as well
  • 26. ● Training ○ Training data: Pairs of [bad question, corrected question] ○ Tensorflow, on a single box with GPUs ○ Training time: 2-3 hours ● Serving: ○ Tensorflow, GPU-based serving ○ Latency: <500 ms p99 ● Run on new questions added to Quora Automatic question correction: System Details
  • 28. ● Checks for BNBR violations on questions, answers, comments. ● Binary classifier ● Training data: ○ Positive: Confirmed BNBR violations ○ Negative: False BNBR reports, other good content ● Model: NN with 1 hidden layer (fastText) ● Same ML decision flow as before BNBR classification
  • 29. ● Quality is one of the most important problems we face at Quora ● There are various systems to maintain quality, and we need to use all of them in order to keep up ● Machine Learning solutions helps us maintain quality at scale ○ ...but you can’t totally bypass human efforts In conclusion