SlideShare a Scribd company logo
1 of 104
A Tinkerer’s Toolbox:
Data Driven Journalism




                                           Tony Hirst
                     Dept of Communication and Systems
                                   The Open University
               Visiting Senior Research Fellow, University of Lincoln
@psychemedia

blog.ouseful.info

      #???
Where I situate myself…
Visualising data helps me make
sense of the world around me
Do you know
   what’s
 possible?
#ddj
Google
Spreadsheets
Data Distributions




                     Outliers
Trends and (anti)correlations...
Explanatory visualization
Data visualizations that are used to
transmit information or a point of
view from the designer to the
reader. Explanatory visualizations
typically have a specific “story” or
information that they are intended
to transmit.

Exploratory visualization
Data visualizations that are used by
the designer for self-informative
purposes to discover patterns,
trends, or sub-problems in a
dataset. Exploratory visualizations
typically don’t have an already-
known story.
Exploiting
Structure
Hierarchical data and treemaps - medals




Pivot tables
Templated data views
Macroscopes
Look for
Differences
Data Can Tell a
    Story
http://www.musik-therapie.at/PederHill/Structure&Plot.htm
Visual Data
Summaries
ggplot() +
geom_linerange(data = d1,aes(x= car, ymin = ymin,ymax = ymax)) +
geom_point(data = d2,aes(x= car, y= value,shape = variable),size = 2) +
opts(title="F1 2011 Korea nRace Summary Chart",
    axis.text.x=theme_text(angle=-90, hjust=0)) +
labs(x = NULL, y = "Position", shape = "")
Data
Clean(s)ing
Google Refine
(Inner) Joins &
 Reconciliation
Google Fusion
Tables
Google Refine
OpenHeatMap
“Data Flow”
“Analog Synth Meeting”, Todd Huffman
Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Find the data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Get the data as data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Transform the data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Enrich the data and transform again…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Display the data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
Publish the displayed data…

                        Google
                                                   Yahoo! Pipe
Wikipedia       HTML    Spreadsheet        CSV
                                                   Import CSV
                        =importHTML



            Embedded
                       <embed>        Google Map    KML
            object
The onlineCSV file
      becomes a spreadsheet
          becomes A DATABASE
Finding data…
site:.gov.uk
filetype:xls
underspend
inurl:http://phx.corporate-ir.net/phoenix.zhtml?
intitle:press
site:phx.corporate-ir.net
inurl:http://phx.corporate-ir.net/phoenix.zhtml?
intitle:press
site:phx.corporate-ir.net
Tapping the
Data Burden
Reporting body              Receiving body


                      Data tap




Data Burdens and FOI
Opening Data
 Up via FOI
“Public Data” &
 Social Media
   Mapping
Emergent views
 of structural
  properties
My “journalism” is tracking down
tools and working out recipes that
     help datasets tell stories
http://delicious.com/stacks/view/CROBXt
Build lazy…
Electrical Safety 101

We get a lot of stuff from
Asia, so it all comes with
funny plugs, travelling just
adds to the fun.

Left to right top to bottom we
have:

Singapore wall socket UK
Adapter UK -> NZ/AU
Double adapter NZ/AU
My cell charger NZ/AU
Adapter NZ/AU -> everything
Andreas cell charger Euro
Camera charger US




                    tolomea
“Hands Passing Baton at Sporting Event”, tableatny
@psychemedia

blog.ouseful.info

More Related Content

More from Tony Hirst

Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismTony Hirst
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear talesTony Hirst
 

More from Tony Hirst (20)

Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data JournalismLincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 
Calrg14 tm351
Calrg14 tm351Calrg14 tm351
Calrg14 tm351
 
Hestia linear tales
Hestia linear talesHestia linear tales
Hestia linear tales
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Lincoln ddj

Editor's Notes

  1. Do we have a hashtag for the workshop?
  2. Collaborative commentary
  3. Through the provision of an API on top of the aggregated local council data, OpenlyLocal can also be treated as a database in its own right. In the example shown here, committee membership is displayed via a treemap showing party affiliations of committee members. (Hovering over a particular grouping displays a list of names of council members on that committee from that party political grouping.) Whilst it would be a major task to take data from every council website in a variety of formats in order to generate similar views for other councils, the work done by OpenlyLocal in aggregating this data and then republishing it via a single API in a single format means that the treemap view can be applied to each council whose data is stored in OpenlyLocal.In passing, it is also worth mentioning how the use of visualisations can be helpful in cleaning data or identifying possible errors in it. In the above example, we see that party affiliations for councillors on the Isle of Wight Council are declared as both Liberal Democrat and and Liberal Democrat Group.
  4. The top, blue strip shows the gear (1 to 7); the green strip shows the throttle pedal depression (0-100%), and the red strip shows the brake (0-100%). The light blue strip is a composite of the previous three strips. The whiter the pixel, the closer it is to 100% throttle in 7th gear with no braking.The bottom two traces show the longitudinal and lateral g-force respectively. For the longitudinal trace, red shows braking – being forced into the steering wheel; green shows acceleration – being forced back into your seat. You’ll see the greatest g-force under braking occurs when the brakes are slapped full on… (the red bits in the third and fifth traces line up). For the latitudinal g-force, the red shows the driving being flung to the left (i.e. right hand corner), the green shows them being pushed out to the right.
  5. Analogsynth – pretty much ultimate freedom to linlk audio processing effects modules together. Simplified by having a common plug.
  6. Some scene setting about what I mean by “flow”…
  7. Suppose we have a table of numerical data associated with placenames on something like Wikipedia. How do we knock up a quick map view of the data?
  8. UK city population search onwikipedia
  9. This can all be a bit flakey – a bit like balancing stones… But It can also be surprisingly stable (for a time at least!)
  10. Here we see the result of pulling data into a Google Spreadsheet from a CSV file published at a particular web address. We now have the ability to run the full range of spreadsheet tools over the data – data which is being pulled in from the datastore, remember.(A similar functionality presumably exists in Microsoft Excel?)
  11. Emergent Social Positioning: origins: 1.5 degree egonet (how followers follow each other, how hashtaggers follow each other)- projection maps from followers to folk they commonly follow;-- projection maps from hashtaggers to folk they commonly follow- projection maps from friends to folk who commonly follow them
  12. Lots of the time, things don’t quite fit: the import format for one tool does not match up with the export formats of another… so sometimes we need an adapter. (Cf. also the notion of impedance mismatch.)
  13. Do we have a hashtag for the workshop?