SlideShare a Scribd company logo
1 of 20
Dissecting Wikipedia




                                     Andrew Gray

           Wikipedian in Residence, British Library

              andrew.gray@bl.uk // @generalising
Wikipedia & Wikimedia



   Wikimedia
      Movement and charitable body
      80-100,000 contributors in 280 languages
        and eleven core projects
      Image repository, dictionary, news site…
      …used by almost 500,000,000 people



   Wikipedia
      25,000,000 articles, 4,000,000 in English
      representing 8-9,000,000 topics & entities
      6,500 articles and 235,000 edits per day

    (…and twelve years ago, this was all fields…)
…so what is Wikipedia?



   …an encyclopedia (more or less)

   …written neutrally

   …and verifiably

   …using previously published information

   …free to use, distribute, or reuse

   …a collaborative community

   …with no firm rules
A developing internal infrastructure



   All edits are visible through watchlists and page histories
      About 7% are vandalism or malicious; processes to detect
         these
      Median time to correction < 2 minutes… but some stay much
         longer

   Individual discussion pages for all articles – “talk”

   Quality review and assessment process

   Specialised working groups and central noticeboards
      eg/ content topics; style; dispute resolution; copyright; etc.
Quality of Wikipedia as a source



   On average… it’s not bad
      In 2005 four errors per article, versus three in Britannica
      In 2011, in English, Spanish & Arabic:
            “…the Wikipedia articles in this sample scored higher overall than the
            comparison articles with respect to accuracy, references, style/
            readability and overall judgment…”

   Millions of articles – so many are, individually, problematic
      Various ways of identifying “signs” of quality
      Markers for quality are both obvious and subtle



   Very effective “springboard” tool
Moving to other content



   Other languages – not translations, and may have more content

   Mousing over footnote markers

   Within the references:
      Links through DOIs and other identifiers
      ISBNs go to a special landing page
           …and then out to libraries, booksellers, etc
      ISSNs go to WorldCat
      If an author, look for authority control links:
Other research tools



   Some tools available – “toolserver” allows live DB queries
      Complex to use, but rewarding




   CatScan: look for intersection of categories
      “all physicists born in 1912” – 53 in English, 35 in German




   Full dumps of all data available – http://dumps.wikipedia.org/



   Reusers – Freebase, DBpedia, Wolfram Alpha
Wikidata



   Wikidata: our new linked data repository
      Phase I: cross-language links
      Phase II: structured data elements
      Phase III: dynamic lists




   Very loosely defined schema

   Currently harvesting structured data from WP

   Public API, open to reusers

   CC-0 licensed data – fully open
Research about Wikipedia



   Thriving research around Wikimedia communities & content
      by mid-2011, 2100 peer-reviewed articles and 38 PhD theses
      Active research committee and WMF support

   Regular community-produced monthly newsletter
      http://enwp.org/meta:Research // @wikiresearch

   Topics include:
      Community and content creation
      Reading and researching by users
      Quality of content
      Technical research
      Large-scale content examination
Research on communities



   Research on the Wikipedia communities:


        Dynamics of community conflict, discussions, collaboration,
         voting, contribution, mentoring…
        Demographics, motivation and specialisms of contributors
        Patterns of growth and content creation/deletion
        Effect of central programs on volunteer activity
        Cross-cultural interaction
Visualisation: discussion dynamics




                                     http://notabilia.net/
Editor activity and motivation




                        http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
Research on users



   Research on usage of Wikipedia:


        Specific searching behaviour
        Patterns of usage (yearly, daily)
        Tracking external events through Wikipedia
        Search engine rankings
        Change in usage by students
        Effect of Wikipedia publication on wider literature
Visualising editing patterns




                       http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
Research on content



   Research on the content of Wikipedia:


        Evolution of content
        Accuracy, coverage and quality
        Biases – geographic, cultural, gender
        Linguistic analysis
        Effect of external publications on Wikipedia
Quality assessment comparisons




           http://commons.wikimedia.org/wiki/File:Boxplot_of_Average_Article_Feedback_ratings_by_project_rated_quality.svg
Research on technical aspects



   Research on the technical side of Wikipedia:


      Extensive work on scaling open-content services
      Tools for detecting and handling vandalism
      Algorithmic detection and identification of bias, spam
      Practical research on uses of wikis
Research using content



   Research using content from Wikipedia

   Hard to distinguish from “conventional” research, but some
    examples:


      Geographical analysis
      Visualisations of content
      Source for extracted datasets


        ...and Wikidata still to come!
Visualising art history




                          http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
Visualising place




                    https://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png

More Related Content

What's hot

Wikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-WikiWikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-Wikiaphaia
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationTed Habermann
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...DuraSpace
 
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...DuraSpace
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadatalibrarianrafia
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 

What's hot (20)

2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
 
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
 
Knowledge Unlatched – Navigating Through the Rapids of Change
Knowledge Unlatched – Navigating Through the Rapids of Change 	Knowledge Unlatched – Navigating Through the Rapids of Change
Knowledge Unlatched – Navigating Through the Rapids of Change
 
KBART-Wilson-ALA Annual 2015 NISO Update
KBART-Wilson-ALA Annual 2015 NISO UpdateKBART-Wilson-ALA Annual 2015 NISO Update
KBART-Wilson-ALA Annual 2015 NISO Update
 
Wikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-WikiWikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-Wiki
 
Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015
 
The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
 
Open Access Metadata Indicators - NISO Update Jan 2014
Open Access Metadata Indicators - NISO Update Jan 2014Open Access Metadata Indicators - NISO Update Jan 2014
Open Access Metadata Indicators - NISO Update Jan 2014
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
 
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic RoadmapALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
 
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadata
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
Caldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data RepositoriesCaldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data Repositories
 

Viewers also liked

Lecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and ReliabilityLecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and Reliabilitydul_e
 
Wikipedia and Medicine
Wikipedia and MedicineWikipedia and Medicine
Wikipedia and MedicineJake Orlowitz
 
FirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchFirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchwebuploader
 

Viewers also liked (6)

Lecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and ReliabilityLecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and Reliability
 
Trusting wikipedia
Trusting wikipediaTrusting wikipedia
Trusting wikipedia
 
Wikipedia and Medicine
Wikipedia and MedicineWikipedia and Medicine
Wikipedia and Medicine
 
The Wikipedia Model
The Wikipedia ModelThe Wikipedia Model
The Wikipedia Model
 
Wikipedia basics
Wikipedia basicsWikipedia basics
Wikipedia basics
 
FirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchFirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearch
 

Similar to Dissecting Wikipedia

Using wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsUsing wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsMolly Knapp
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaNick Sheppard
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiJake Orlowitz
 
Wikimedia Presentation for Schools
Wikimedia Presentation for SchoolsWikimedia Presentation for Schools
Wikimedia Presentation for SchoolsCraig Franklin
 
Mediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumMediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumRandy Thornton
 
An Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingAn Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingSherri Cost
 
Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisWłodzimierz Lewoniewski
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and WikipediaJake Orlowitz
 
Using wikis for teaching
Using wikis for teachingUsing wikis for teaching
Using wikis for teachingMartin Walker
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia dorohoward
 
Future libraries london
Future libraries londonFuture libraries london
Future libraries londonJake Orlowitz
 
ALIA Wikipedia and libraries
ALIA Wikipedia and librariesALIA Wikipedia and libraries
ALIA Wikipedia and librariesPru Mitchell
 
Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010SteveVirgin
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...innovatics
 
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingStudent to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingMargot
 
Web 2.0
Web 2.0Web 2.0
Web 2.0bjornh
 
Social Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionSocial Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionADINET Ahmedabad
 

Similar to Dissecting Wikipedia (20)

Using wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsUsing wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trends
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s Visibilityi
 
Wikimedia Presentation for Schools
Wikimedia Presentation for SchoolsWikimedia Presentation for Schools
Wikimedia Presentation for Schools
 
Wiki on Library Perspective
Wiki on Library PerspectiveWiki on Library Perspective
Wiki on Library Perspective
 
Mediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumMediawiki and Wiki As a Medium
Mediawiki and Wiki As a Medium
 
An Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingAn Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital Writing
 
Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysis
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and Wikipedia
 
Using wikis for teaching
Using wikis for teachingUsing wikis for teaching
Using wikis for teaching
 
An introduction to Wikipedia and cataloguing issues
An introduction to Wikipedia and cataloguing issuesAn introduction to Wikipedia and cataloguing issues
An introduction to Wikipedia and cataloguing issues
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia
 
Future libraries london
Future libraries londonFuture libraries london
Future libraries london
 
ALIA Wikipedia and libraries
ALIA Wikipedia and librariesALIA Wikipedia and libraries
ALIA Wikipedia and libraries
 
Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
 
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingStudent to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Social Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionSocial Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interaction
 

More from Andrew Gray

Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Andrew Gray
 
Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Andrew Gray
 
Community communications slides
Community communications slidesCommunity communications slides
Community communications slidesAndrew Gray
 
Wikipedia in the Library Wikimania Hong Kong
Wikipedia in the Library   Wikimania Hong KongWikipedia in the Library   Wikimania Hong Kong
Wikipedia in the Library Wikimania Hong KongAndrew Gray
 
Introduction to Wikidata
Introduction to WikidataIntroduction to Wikidata
Introduction to WikidataAndrew Gray
 
Social Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsSocial Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsAndrew Gray
 
AHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAndrew Gray
 
Wikipedia Workshop presentation
Wikipedia Workshop presentationWikipedia Workshop presentation
Wikipedia Workshop presentationAndrew Gray
 

More from Andrew Gray (8)

Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014
 
Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013
 
Community communications slides
Community communications slidesCommunity communications slides
Community communications slides
 
Wikipedia in the Library Wikimania Hong Kong
Wikipedia in the Library   Wikimania Hong KongWikipedia in the Library   Wikimania Hong Kong
Wikipedia in the Library Wikimania Hong Kong
 
Introduction to Wikidata
Introduction to WikidataIntroduction to Wikidata
Introduction to Wikidata
 
Social Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsSocial Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal Manuscripts
 
AHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence Report
 
Wikipedia Workshop presentation
Wikipedia Workshop presentationWikipedia Workshop presentation
Wikipedia Workshop presentation
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Dissecting Wikipedia

  • 1. Dissecting Wikipedia Andrew Gray Wikipedian in Residence, British Library andrew.gray@bl.uk // @generalising
  • 2. Wikipedia & Wikimedia  Wikimedia  Movement and charitable body  80-100,000 contributors in 280 languages and eleven core projects  Image repository, dictionary, news site…  …used by almost 500,000,000 people  Wikipedia  25,000,000 articles, 4,000,000 in English  representing 8-9,000,000 topics & entities  6,500 articles and 235,000 edits per day (…and twelve years ago, this was all fields…)
  • 3. …so what is Wikipedia?  …an encyclopedia (more or less)  …written neutrally  …and verifiably  …using previously published information  …free to use, distribute, or reuse  …a collaborative community  …with no firm rules
  • 4. A developing internal infrastructure  All edits are visible through watchlists and page histories  About 7% are vandalism or malicious; processes to detect these  Median time to correction < 2 minutes… but some stay much longer  Individual discussion pages for all articles – “talk”  Quality review and assessment process  Specialised working groups and central noticeboards  eg/ content topics; style; dispute resolution; copyright; etc.
  • 5. Quality of Wikipedia as a source  On average… it’s not bad  In 2005 four errors per article, versus three in Britannica  In 2011, in English, Spanish & Arabic: “…the Wikipedia articles in this sample scored higher overall than the comparison articles with respect to accuracy, references, style/ readability and overall judgment…”  Millions of articles – so many are, individually, problematic  Various ways of identifying “signs” of quality  Markers for quality are both obvious and subtle  Very effective “springboard” tool
  • 6. Moving to other content  Other languages – not translations, and may have more content  Mousing over footnote markers  Within the references:  Links through DOIs and other identifiers  ISBNs go to a special landing page  …and then out to libraries, booksellers, etc  ISSNs go to WorldCat  If an author, look for authority control links:
  • 7. Other research tools  Some tools available – “toolserver” allows live DB queries  Complex to use, but rewarding  CatScan: look for intersection of categories  “all physicists born in 1912” – 53 in English, 35 in German  Full dumps of all data available – http://dumps.wikipedia.org/  Reusers – Freebase, DBpedia, Wolfram Alpha
  • 8. Wikidata  Wikidata: our new linked data repository  Phase I: cross-language links  Phase II: structured data elements  Phase III: dynamic lists  Very loosely defined schema  Currently harvesting structured data from WP  Public API, open to reusers  CC-0 licensed data – fully open
  • 9. Research about Wikipedia  Thriving research around Wikimedia communities & content  by mid-2011, 2100 peer-reviewed articles and 38 PhD theses  Active research committee and WMF support  Regular community-produced monthly newsletter  http://enwp.org/meta:Research // @wikiresearch  Topics include:  Community and content creation  Reading and researching by users  Quality of content  Technical research  Large-scale content examination
  • 10. Research on communities  Research on the Wikipedia communities:  Dynamics of community conflict, discussions, collaboration, voting, contribution, mentoring…  Demographics, motivation and specialisms of contributors  Patterns of growth and content creation/deletion  Effect of central programs on volunteer activity  Cross-cultural interaction
  • 11. Visualisation: discussion dynamics http://notabilia.net/
  • 12. Editor activity and motivation http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
  • 13. Research on users  Research on usage of Wikipedia:  Specific searching behaviour  Patterns of usage (yearly, daily)  Tracking external events through Wikipedia  Search engine rankings  Change in usage by students  Effect of Wikipedia publication on wider literature
  • 14. Visualising editing patterns http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
  • 15. Research on content  Research on the content of Wikipedia:  Evolution of content  Accuracy, coverage and quality  Biases – geographic, cultural, gender  Linguistic analysis  Effect of external publications on Wikipedia
  • 16. Quality assessment comparisons http://commons.wikimedia.org/wiki/File:Boxplot_of_Average_Article_Feedback_ratings_by_project_rated_quality.svg
  • 17. Research on technical aspects  Research on the technical side of Wikipedia:  Extensive work on scaling open-content services  Tools for detecting and handling vandalism  Algorithmic detection and identification of bias, spam  Practical research on uses of wikis
  • 18. Research using content  Research using content from Wikipedia  Hard to distinguish from “conventional” research, but some examples:  Geographical analysis  Visualisations of content  Source for extracted datasets  ...and Wikidata still to come!
  • 19. Visualising art history http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
  • 20. Visualising place https://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png