Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Orion: An Integrated Multimedia Content
Moderation System for Web Services
Yusuke Fujisaka
Akihabara Lab., CyberAgent, Inc...
Our business
Media Internet AD
Game Startup
Our media services
AbemaTV (AbemaTV, Inc.)
● Free-to-view internet TV with TVCFs
● 30M+ downloads
Ameba
● “Ameblo”: Japan’...
Agenda
1. Motivation
2. System overview
3. Orion’s effect
4. Conclusion
Motivation
● Social Networking Services (SNS) rely on User Generated Content (UGC)
● Some UGC are viewed as spam
● Platfor...
Motivation
● Social Networking Services (SNS) rely on User Generated Content (UGC)
● Some UGC are viewed as spam
● Platfor...
Spam characteristics
● Only a small fraction of content and users are involved with spam
All post
Spam post
〜 1/1000
〜 1/2...
Spam characteristics
● Types of spam include:
○ Adult content
○ Grotesque content
○ Duplicate posts originated by certain ...
Filtering vs. Operator
Case 1: Deploy filter systems to moderate UGC
Pros:
● Cost efficient
● Ability to handle huge amoun...
Filtering vs. Operator
Case 2: Operators control spam messages
Pros:
● Humans always follow trend
○ Operators classify UGC...
Filtering with Operator
● We need to manage a large amount of data, cost efficiently and avoid
incorrect labelling
● Two s...
System overview
● Orion: integrated content moderation system
○ Combination of “automatic filtering” and “manual moderatio...
Streaming module
● Collects user posts from services
● Filters suspicious content as defined by each service
○ 300+ filter...
User level
● “Well-behaved” users are considered to not require content checking.
● What is “well-behaved” user?
○ Those w...
Moderation service
● Operators can moderate in service-dedicated window
● Dummy posts & quality checks are included
Analyze / Reporting
● We collect information from a variety of sources
○ Spam category, service, operator...
○ Unique IDs ...
Effect > Spam removal efficiency
● 35+ services in use
● Orion filters and moderates millions of pages of content
New serv...
Effect > Spam removal efficiency
● Ratio comparing 2014-2015 vs. 2017-2018
(%) Check/All Delete/Check Delete/All
Min Max A...
Effect > Moderation effect
● Orion has been effective since deployment
○ Criminal activity among our company’s services ha...
Conclusion
● Content moderation should not rely solely on automatic classification nor
manual moderation
● We introduced O...
Bibliography
[1] Roberts, Sarah T. "Commercial content moderation: Digital laborers' dirty work." (2016).
[2] Sawyer, Mich...
Thank you.
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Orion an integrated multimedia content moderation system for web services

Download to read offline

Orion an integrated multimedia content moderation system for web services

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Orion an integrated multimedia content moderation system for web services

  1. 1. Orion: An Integrated Multimedia Content Moderation System for Web Services Yusuke Fujisaka Akihabara Lab., CyberAgent, Inc. fujisaka_yusuke@cyberagent.co.jp
  2. 2. Our business Media Internet AD Game Startup
  3. 3. Our media services AbemaTV (AbemaTV, Inc.) ● Free-to-view internet TV with TVCFs ● 30M+ downloads Ameba ● “Ameblo”: Japan’s largest blog service ● 20,000+ official blogs Tapple (MatchingAgent, Inc.) ● Japan’s largest dating app ● 3.5M+ users, 100M+ matches AWA (AWA, Inc.) ● Music subscription service ● 16M+ downloads, 45M+ musics
  4. 4. Agenda 1. Motivation 2. System overview 3. Orion’s effect 4. Conclusion
  5. 5. Motivation ● Social Networking Services (SNS) rely on User Generated Content (UGC) ● Some UGC are viewed as spam ● Platform needs aims to eliminate spam from SNS
  6. 6. Motivation ● Social Networking Services (SNS) rely on User Generated Content (UGC) ● Some UGC are viewed as spam ● Platform needs aims to eliminate spam from SNS
  7. 7. Spam characteristics ● Only a small fraction of content and users are involved with spam All post Spam post 〜 1/1000 〜 1/200
  8. 8. Spam characteristics ● Types of spam include: ○ Adult content ○ Grotesque content ○ Duplicate posts originated by certain bot ○ Abusive posts ○ Criminal posts ○ etc. ● Spam affects users not only psychologically, but also physically ● Spam may reduce the reliability of SNS ● Spam trends changes
  9. 9. Filtering vs. Operator Case 1: Deploy filter systems to moderate UGC Pros: ● Cost efficient ● Ability to handle huge amount of data Cons: ● Models must upgrade to follow spam trends ● False-(positive, negative) happens ○ Spam UGC remains on service ○ obviously safe UGCs mistakenly deleted ○ → Service satisfaction may decrease
  10. 10. Filtering vs. Operator Case 2: Operators control spam messages Pros: ● Humans always follow trend ○ Operators classify UGCs as same view as users ● Reduce incorrect tagging ○ If operators can effectively moderate contents Cons: ● Cost inefficient ● Resource limited
  11. 11. Filtering with Operator ● We need to manage a large amount of data, cost efficiently and avoid incorrect labelling ● Two steps to process ○ Step 1: Deploy automatic filters to extract contents including suspicious words or behavior ○ Step 2: Perform manual operation to detect actual spam contents and remove them Safe data: Not caught by filter Step 2 Step 1 Suspicious contents Spam
  12. 12. System overview ● Orion: integrated content moderation system ○ Combination of “automatic filtering” and “manual moderation” Service log Service Streaming Metadata DB Filter Moderation API Admin API Web Server Operator Feedback Queue Retrieval Engine Content DB Automatic modules Manual modules
  13. 13. Streaming module ● Collects user posts from services ● Filters suspicious content as defined by each service ○ 300+ filters to mark content for moderation ○ Maximum coverage, low latency required ○ Determine whether operator check is required Correction check User level check Filtering / moderation mark Save to DB Gather UGCs from service Word filter Repeat post filter ML-based filter Image filter
  14. 14. User level ● “Well-behaved” users are considered to not require content checking. ● What is “well-behaved” user? ○ Those who post frequently without spam ● User level ○ “Problem users’” posts must be checked regardless of filtering ○ “Safe users” need not be checked as often Problem user General user New user Safe user Total post # Deleted post #
  15. 15. Moderation service ● Operators can moderate in service-dedicated window ● Dummy posts & quality checks are included
  16. 16. Analyze / Reporting ● We collect information from a variety of sources ○ Spam category, service, operator... ○ Unique IDs sent from each service are used to identify the information ● Reporting assures quality of moderation ○ If an operator failed to identify dummy spam data, it will be indicated on the report ○ Reports are displayed on a Tableau server
  17. 17. Effect > Spam removal efficiency ● 35+ services in use ● Orion filters and moderates millions of pages of content New service User level applies New service All post Suspicious post Deleted post
  18. 18. Effect > Spam removal efficiency ● Ratio comparing 2014-2015 vs. 2017-2018 (%) Check/All Delete/Check Delete/All Min Max Ave Min Max Ave Min Max Ave ‘14-’15 1.17 26.44 7.62 0.10 2.86 0.43 0.004 0.756 0.034 Change 0.61x 5.04x 2.97x ‘17-’18 3.09 6.32 4.66 1.51 3.64 2.17 0.063 0.165 0.101
  19. 19. Effect > Moderation effect ● Orion has been effective since deployment ○ Criminal activity among our company’s services has greatly declined ○ No criminal case has observed in late 2017 → Time period →Criminalcase# → Orion operational
  20. 20. Conclusion ● Content moderation should not rely solely on automatic classification nor manual moderation ● We introduced Orion, which integrates automatic filtering and manual moderation ○ UGCs are screened by various filters and suspicious UGCs are send for manual moderation ○ Operators are monitored to ensure a high moderation quality ● On deploying Orion, the amount of UGC requiring manual moderation decreased, and the number of criminal posts sharply declined
  21. 21. Bibliography [1] Roberts, Sarah T. "Commercial content moderation: Digital laborers' dirty work." (2016). [2] Sawyer, Michael S. "Filters, Fair Use & Feedback: User-Generated Content Principles and the DMCA." Berkeley Tech. LJ 24 (2009): 363. [3] Ghosh, Arpita, Satyen Kale, and Preston McAfee. "Who moderates the moderators?: crowdsourcing abuse detection in user-generated content." Proceedings of the 12th ACM conference on Electronic commerce. ACM, 2011. [4] Wang, Gang, et al. "Social turing tests: Crowdsourcing sybil detection." arXiv preprint arXiv:1205.3856 (2012). [5] Aoe, Jun‐Ichi, Katsushi Morimoto, and Takashi Sato. "An efficient implementation of trie structures." Software: Practice and Experience 22.9 (1992): 695-721.
  22. 22. Thank you.

Orion an integrated multimedia content moderation system for web services

Views

Total views

994

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×