SlideShare a Scribd company logo
1 of 44
Download to read offline
Content-Driven
Author Reputation and Text Trust
          for the Wikipedia
              Luca de Alfaro
               UC Santa Cruz

               Joint work with
  Bo Adler, Ian Pye, Caitlin Sadowski (UCSC)


                            Wikimania, August 2007
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Author Reputation and Text Trust

Author Reputation:
• Goal: Encourage authors to provide lasting contributions.
Text Trust:
• Goal: provide a measure of the reliability of the text.
• Method: computed from the reputation of the authors
  who create and revise the text.
Reputation: Our guiding principles

• Do not alter the Wikipedia user experience
  – Compute reputation from content evolution, rather
    than user-to-user comments.
• Be welcoming to all users
  – Never publicly display user reputation values.
    Authors know only their own reputation.
• Be objective
  – Rely on content evolution rather than comments.
  – Quantitatively evaluate how well it works.
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                 A Wikipedia article

                        A
                             edits
                   +
time




                        B
                             builds on A’s edit
Content-driven reputation

• Authors of long-lived contributions gain reputation
• Authors of reverted contributions lose reputation

                  A Wikipedia article

                         A
                              edits
                   +
time




              +          B
                              builds on A’s edit
                   -
                         C    reverts to A’s version
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
Content-driven reputation mitigates
reputation wars
                                      -2
Wars in user-driven reputation:   A        B
                                      -3
Content-driven reputation mitigates
reputation wars
                                             -2
Wars in user-driven reputation:      A                 B
                                             -3

Wars in content-driven reputation:

                   • B can badmouth A by undoing
                      her work
         A
             -     • But this is risky: if others then
         B            re-instate A’s work, it is B’s
                      reputation that suffers.
Content-driven reputation mitigates
reputation wars
                                              -2
Wars in user-driven reputation:      A                  B
                                              -3

Wars in content-driven reputation:

                     • B can badmouth A by undoing
                       her work
         A
             -       • But this is risky: if others then
                 +
         B             re-instate A’s work, it is B’s
           -           reputation that suffers.
          others?
Validation: Does our reputation have
predictive value?
              Time
Article 1
Article 2
Article 3
Article 4
  ...


              = edits by user A
Validation: Does our reputation have
predictive value?
                       Time
Article 1
Article 2
                                   E
Article 3
Article 4
  ...


                                       The longevity of an edit E
   The reputation of author A
                                        depends on the history
at the time of an edit E depends
                                            after the edit.
 on the history before the edit.


              Can we show a correlation between
            author reputation and edit longevity ?
Building a content-driven
             reputation system
                for Wikipedia



This is a summary; for details see:
B.T. Adler, L. de Alfaro. A Content Driven Reputation
System for the Wikipedia. In Proc. of WWW 2007.
What is a “contribution”?

     Text                        Edit

     bla   ei        bla   yak            bla bla


                                        buy viagra!
    bla    yak ei   yak    bla

                                          bla bla
We measure how
long the added
                    We measure how long the “edit”
text survives.
                    (reorganization) survives.
Based on text
                    Based on edit distance.
tracking.
Text


version 9      bla   bla wuga boink
                5     8   9    6

 version 10    bla   bla   wuga   wuga   wuga boink
                5     8     10     10     9    6


 We label each word with the version where it was
 introduced. This enables us to keep track of how
 long it lives.
Text: the destiny of a contribution


      of words
      number



                                       time
                                    (versions)

    Amount of        Amount of
     new text      surviving text



    The life of the text introduced at a revision.
Text: Longevity

            Tk
                                                 j-k
                                   Tk ¢ α text
          of words
          number



                                                  time
                               j
                     k                     (versions)

• Text longevity: the α text 2 [0,1] that yields the best
  geometrical approximation for the amount of residual
  text.
• Short-lived text: α text < 0.2 (at most 20% of the text
  makes it from one version to the next).
Text: Reputation update

          Tk
                                 Tj
        of words
        number



                                         time
                            j
                   k                  (versions)
                   Ak       Aj        (authors)


As a consequence of edit j, we increase the reputation
  of Ak by an amount proportional to Tj and to the
  reputation of Aj
Measuring surviving text

                                          “Dead” text
                “Live” text

Version
         wuga     boing bla ble     stored as “dead”
 9
          7        9     6     6


                                           wuga   boing bla ble
          buy    viagra now!
 10
                                            7      9     6    6
          10       10    10           t
                                   es h
                                   bc
                                     at
                                   m
         wuga     boing bla ble
 11
          7        9     6     6
      We track authorship of deleted text, and we match the text of
      new versions both with live and with dead text.
Edit

 k-1 d(k
        -   1,
                 k)

                      k judged
  d(k




                      d(k, j)
     -1,




                                 k<j
    j)




                 j     judge




We compute the edit distance between versions k-1, k, and
 j, with k < j
                                       (see paper for details on the distance)
Edit: good or bad?

        the past
                                                     k judged
  k-1
                                 the past
                                      k-1
                 k judged




                                                    , j)
  d(k




                                    d(k
                 d(k, j)




                                                 d(k
     -1,




                                       -1,
        j)




                                       j)
                                             j
             j    judge                          judge

     the future                       the future


k is good: d(k-1, j) > d(k, j)     k is bad: d(k-1, j) < d(k, j)
 “k went towards the future”        “k went against the future”
Edit: Longevity
 the past
                                                   Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                    k
 ogr
          )-d
              (k,j
              es s




                                        The fraction of change that is in
                   )
                    ”




                                          the same direction of the future.
                                            α edit ' 1: k is a good edit
                                        •
                                j           α edit ' -1: k is reverted
                                        •
                  the future
Edit: Updating reputation
 the past
                                         Edit Longevity:
                        “w
                          ork
       k-1
                        d(k done
                           -1     ”
                              ,k)
  d(k
“ pr
     -1,j




                                  kA
 ogr
          )-d
              (k,j
              es s




                                     k
                                         Reputation update:
                   )
                    ”




                                         The reputation of Ak

                                         • increases if α edit > 0,
                                j Aj
                                         • decreases if α edit < 0.
                  the future

                                          (see paper for details)
Data Sets

• English till Feb 07 1,988,627 pages, 40,455,416 versions

• French till Feb 07    452,577 pages,   5,643,636 versions

• Italian till May 07   301,584 pages,   3,129,453 versions



The entire Wikipedias, with the whole history, not just a
  sample (we wanted to compute the reputation using all edits
  of each user).
Results: English Wikipedia, in detail

                                      % of edits below a given longevity

                       Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
                         0   16.922    93.11   91.65   89.15   83.76       73.53
                         1    1.191    77.24   69.83   65.60   61.11       56.00
                         2    1.335    69.53   57.08   49.79   45.71       41.25
log (1 + reputation)




                         3    1.627    38.00   28.61   20.23   16.16       13.62
                         4    2.780    32.84   22.31   13.32    9.57        8.04
                         5    4.408    41.70   15.76    5.90    3.80        2.57
                         6    6.698    29.40   16.74    7.54    4.35        3.12
                         7    8.281    32.04   15.16    5.44    2.25        1.40
                         8   12.233    34.06   16.64    6.78    3.79        2.73
                         9   44.524    32.55   15.51    5.05    1.88        1.14
Results: English Wikipedia, in detail

                                       % of edits below a given longevity

                        Bin   %_data    l<0.8   l<0.4   l<0.0   l<-0.4      l<-0.8
low                       0   16.922    93.11   91.65   89.15   83.76       73.53
rep                       1    1.191    77.24   69.83   65.60   61.11       56.00
                          2    1.335    69.53   57.08   49.79   45.71       41.25
 log (1 + reputation)




                          3    1.627    38.00   28.61   20.23   16.16       13.62
                          4    2.780    32.84   22.31   13.32    9.57        8.04
                          5    4.408    41.70   15.76    5.90    3.80        2.57
                          6    6.698    29.40   16.74    7.54    4.35        3.12
                          7    8.281    32.04   15.16    5.44    2.25        1.40
                          8   12.233    34.06   16.64    6.78    3.79        2.73
                          9   44.524    32.55   15.51    5.05    1.88        1.14

                                                                  Short-Lived
Predictive power of low reputation

Low-reputation:      Lower 20% of range


            Short-lived edits         Short-lived text
                                          α text
                  α edit                           · 0.2
                           · -0.8
                                         (less than 20%
           (almost entirely undone)
                                      survives each revision)
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                                A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong




  Old text is colored according       New text is colored
  to the reputation of its original   according to the
  author, and of all subsequent       reputation of A
  revisors (including A).
Text trust

      Yadda yadda wuga wuga bla bla bla bing bong
                           A


Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong


• On the English Wikipedia, we should be able to spot
  untrusted content with over 80% recall and 60%
  precision!
   – In fact, we do even better than this, as new content
     is always flagged lower trust (see next).
Demo: http://trust.cse.ucsc.edu/
Text trust: How is “Fogh” spelled?
Text Trust: more examples from the demo
Text Trust: Details

Trust depends on:
• Authorship: Author lends 50% of their reputation to
  the text they create.
   – Thus, even text from high-rep authors is only medium-
     rep when added: high trust is achieved only via multiple
     reviews, never via a single author.
• Revision: When an author of reputation r preserves a
  word of trust t < r, the word increases in trust to
                          t + 0.3(r – t)
• The algorithms still need fine-tuning.
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
From fresh to trusted text
Batch Implementation


                      periodic xml dumps
                          (to initialize)

                           edit feed
                       (to keep updated)
                                              Trust server
Wikipedia servers

• No need to affect the main Wikipedia servers
• People can click “check trust” and visit the trust server.
• Good for experimenting with new ideas
• Necessary to color the past (come up to speed).
On-Line Implementation

Process edits as they arrive:
• Benefit: real-time colorization of text
• Need to integrate the code in MediaWiki
• Time to process an edit: < 1s (not much longer than
  parsing it).
• Storage required: proportional to the size of the last
  revision (not to the total history size!)
• Can be easily used for other Wikis
My questions:
• Feedback?
• Do you like it?
• Should we try to set up a “trust server” with
  an edit feed from the Wikipedia?
• Try the demo:

       http://trust.cse.ucsc.edu/

Your questions?

More Related Content

Viewers also liked

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusRégis Vansnick
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxQuinchy Riya
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]HUB INSTITUTE
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Pptkmlaughl
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social MediaPaul Marsden
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Hicham Sabre
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringHUB INSTITUTE
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfEmarketing.fr
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPTvarun0912
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsAhmad Jawdat
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blancAref Jdey
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenVanksen
 

Viewers also liked (13)

E-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenusE-reputation : contexte, outils, stratégie et contenus
E-reputation : contexte, outils, stratégie et contenus
 
Communiquer sur les réseaux sociaux
Communiquer sur les réseaux sociauxCommuniquer sur les réseaux sociaux
Communiquer sur les réseaux sociaux
 
Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]Les clés de l'E-Reputation en 2014 [HUB Report]
Les clés de l'E-Reputation en 2014 [HUB Report]
 
Online Reputation Management
Online Reputation ManagementOnline Reputation Management
Online Reputation Management
 
Social Networking Ppt
Social Networking PptSocial Networking Ppt
Social Networking Ppt
 
Reputation Management and Social Media
Reputation Management and Social MediaReputation Management and Social Media
Reputation Management and Social Media
 
Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux Accessibilité et Réseaux Sociaux
Accessibilité et Réseaux Sociaux
 
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoringE-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
E-Reputation : 10 erreurs à éviter, ou comment réussir son buzzmonitoring
 
Datagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vfDatagency iab capsules_20150211_versiontechdays_vf
Datagency iab capsules_20150211_versiontechdays_vf
 
Social networking PPT
Social networking PPTSocial networking PPT
Social networking PPT
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
 
E-reputation : le livre blanc
E-reputation : le livre blancE-reputation : le livre blanc
E-reputation : le livre blanc
 
E-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by VanksenE-réputation et marques : état de l'art et enjeux - by Vanksen
E-réputation et marques : état de l'art et enjeux - by Vanksen
 

More from nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Archnextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conferencenextlib
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architecturesnextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New Worldnextlib
 
Social Graph
Social GraphSocial Graph
Social Graphnextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Predictionnextlib
 
Closures for Java
Closures for JavaClosures for Java
Closures for Javanextlib
 
SVD review
SVD reviewSVD review
SVD reviewnextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlersnextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategynextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systemsnextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Designnextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Bigtable
BigtableBigtable
Bigtablenextlib
 

More from nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Recently uploaded

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 

Recently uploaded (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 

UC Santa Cruz Author Provides Content-Driven Reputation Model for Wikipedia

  • 1. Content-Driven Author Reputation and Text Trust for the Wikipedia Luca de Alfaro UC Santa Cruz Joint work with Bo Adler, Ian Pye, Caitlin Sadowski (UCSC) Wikimania, August 2007
  • 2. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions.
  • 3. Author Reputation and Text Trust Author Reputation: • Goal: Encourage authors to provide lasting contributions. Text Trust: • Goal: provide a measure of the reliability of the text. • Method: computed from the reputation of the authors who create and revise the text.
  • 4. Reputation: Our guiding principles • Do not alter the Wikipedia user experience – Compute reputation from content evolution, rather than user-to-user comments. • Be welcoming to all users – Never publicly display user reputation values. Authors know only their own reputation. • Be objective – Rely on content evolution rather than comments. – Quantitatively evaluate how well it works.
  • 5. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article time
  • 6. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time
  • 7. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits time B builds on A’s edit
  • 8. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time B builds on A’s edit
  • 9. Content-driven reputation • Authors of long-lived contributions gain reputation • Authors of reverted contributions lose reputation A Wikipedia article A edits + time + B builds on A’s edit - C reverts to A’s version
  • 10. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B
  • 11. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3
  • 12. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then B re-instate A’s work, it is B’s reputation that suffers.
  • 13. Content-driven reputation mitigates reputation wars -2 Wars in user-driven reputation: A B -3 Wars in content-driven reputation: • B can badmouth A by undoing her work A - • But this is risky: if others then + B re-instate A’s work, it is B’s - reputation that suffers. others?
  • 14. Validation: Does our reputation have predictive value? Time Article 1 Article 2 Article 3 Article 4 ... = edits by user A
  • 15. Validation: Does our reputation have predictive value? Time Article 1 Article 2 E Article 3 Article 4 ... The longevity of an edit E The reputation of author A depends on the history at the time of an edit E depends after the edit. on the history before the edit. Can we show a correlation between author reputation and edit longevity ?
  • 16. Building a content-driven reputation system for Wikipedia This is a summary; for details see: B.T. Adler, L. de Alfaro. A Content Driven Reputation System for the Wikipedia. In Proc. of WWW 2007.
  • 17. What is a “contribution”? Text Edit bla ei bla yak bla bla buy viagra! bla yak ei yak bla bla bla We measure how long the added We measure how long the “edit” text survives. (reorganization) survives. Based on text Based on edit distance. tracking.
  • 18. Text version 9 bla bla wuga boink 5 8 9 6 version 10 bla bla wuga wuga wuga boink 5 8 10 10 9 6 We label each word with the version where it was introduced. This enables us to keep track of how long it lives.
  • 19. Text: the destiny of a contribution of words number time (versions) Amount of Amount of new text surviving text The life of the text introduced at a revision.
  • 20. Text: Longevity Tk j-k Tk ¢ α text of words number time j k (versions) • Text longevity: the α text 2 [0,1] that yields the best geometrical approximation for the amount of residual text. • Short-lived text: α text < 0.2 (at most 20% of the text makes it from one version to the next).
  • 21. Text: Reputation update Tk Tj of words number time j k (versions) Ak Aj (authors) As a consequence of edit j, we increase the reputation of Ak by an amount proportional to Tj and to the reputation of Aj
  • 22. Measuring surviving text “Dead” text “Live” text Version wuga boing bla ble stored as “dead” 9 7 9 6 6 wuga boing bla ble buy viagra now! 10 7 9 6 6 10 10 10 t es h bc at m wuga boing bla ble 11 7 9 6 6 We track authorship of deleted text, and we match the text of new versions both with live and with dead text.
  • 23. Edit k-1 d(k - 1, k) k judged d(k d(k, j) -1, k<j j) j judge We compute the edit distance between versions k-1, k, and j, with k < j (see paper for details on the distance)
  • 24. Edit: good or bad? the past k judged k-1 the past k-1 k judged , j) d(k d(k d(k, j) d(k -1, -1, j) j) j j judge judge the future the future k is good: d(k-1, j) > d(k, j) k is bad: d(k-1, j) < d(k, j) “k went towards the future” “k went against the future”
  • 25. Edit: Longevity the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j k ogr )-d (k,j es s The fraction of change that is in ) ” the same direction of the future. α edit ' 1: k is a good edit • j α edit ' -1: k is reverted • the future
  • 26. Edit: Updating reputation the past Edit Longevity: “w ork k-1 d(k done -1 ” ,k) d(k “ pr -1,j kA ogr )-d (k,j es s k Reputation update: ) ” The reputation of Ak • increases if α edit > 0, j Aj • decreases if α edit < 0. the future (see paper for details)
  • 27. Data Sets • English till Feb 07 1,988,627 pages, 40,455,416 versions • French till Feb 07 452,577 pages, 5,643,636 versions • Italian till May 07 301,584 pages, 3,129,453 versions The entire Wikipedias, with the whole history, not just a sample (we wanted to compute the reputation using all edits of each user).
  • 28. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 0 16.922 93.11 91.65 89.15 83.76 73.53 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14
  • 29. Results: English Wikipedia, in detail % of edits below a given longevity Bin %_data l<0.8 l<0.4 l<0.0 l<-0.4 l<-0.8 low 0 16.922 93.11 91.65 89.15 83.76 73.53 rep 1 1.191 77.24 69.83 65.60 61.11 56.00 2 1.335 69.53 57.08 49.79 45.71 41.25 log (1 + reputation) 3 1.627 38.00 28.61 20.23 16.16 13.62 4 2.780 32.84 22.31 13.32 9.57 8.04 5 4.408 41.70 15.76 5.90 3.80 2.57 6 6.698 29.40 16.74 7.54 4.35 3.12 7 8.281 32.04 15.16 5.44 2.25 1.40 8 12.233 34.06 16.64 6.78 3.79 2.73 9 44.524 32.55 15.51 5.05 1.88 1.14 Short-Lived
  • 30. Predictive power of low reputation Low-reputation: Lower 20% of range Short-lived edits Short-lived text α text α edit · 0.2 · -0.8 (less than 20% (almost entirely undone) survives each revision)
  • 31. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong Old text is colored according New text is colored to the reputation of its original according to the author, and of all subsequent reputation of A revisors (including A).
  • 32. Text trust Yadda yadda wuga wuga bla bla bla bing bong A Yadda yadda wuga wuga yak yak yuk bla bla bla bing bong • On the English Wikipedia, we should be able to spot untrusted content with over 80% recall and 60% precision! – In fact, we do even better than this, as new content is always flagged lower trust (see next).
  • 34. Text trust: How is “Fogh” spelled?
  • 35. Text Trust: more examples from the demo
  • 36. Text Trust: Details Trust depends on: • Authorship: Author lends 50% of their reputation to the text they create. – Thus, even text from high-rep authors is only medium- rep when added: high trust is achieved only via multiple reviews, never via a single author. • Revision: When an author of reputation r preserves a word of trust t < r, the word increases in trust to t + 0.3(r – t) • The algorithms still need fine-tuning.
  • 37. From fresh to trusted text
  • 38. From fresh to trusted text
  • 39. From fresh to trusted text
  • 40. From fresh to trusted text
  • 41. From fresh to trusted text
  • 42. Batch Implementation periodic xml dumps (to initialize) edit feed (to keep updated) Trust server Wikipedia servers • No need to affect the main Wikipedia servers • People can click “check trust” and visit the trust server. • Good for experimenting with new ideas • Necessary to color the past (come up to speed).
  • 43. On-Line Implementation Process edits as they arrive: • Benefit: real-time colorization of text • Need to integrate the code in MediaWiki • Time to process an edit: < 1s (not much longer than parsing it). • Storage required: proportional to the size of the last revision (not to the total history size!) • Can be easily used for other Wikis
  • 44. My questions: • Feedback? • Do you like it? • Should we try to set up a “trust server” with an edit feed from the Wikipedia? • Try the demo: http://trust.cse.ucsc.edu/ Your questions?