2. CONTENT
• Introduction
• What is data journalism
• Processes in data journalism
• Data journalism tools
• How to become a data journalist
• Benefits of using data journalism skills
3. INTRODUCTION
• The aim of this presentation is to provide a starting place for
introducing data journalism into the newsroom. It aims to help
journalists to start to develop the skills to write data stories.
• Data journalism is a journalism specialty reflecting the
increased role that numerical data is used in the production and
distribution of information in the digital era. It reflects the
increased interaction between content producers (journalist)
and several other fields such as design, computer
science and statistics. From the point of view of journalists, it
represents "an overlapping set of competencies drawn from
disparate fields".
4. WHAT IS DATA JOURNALISM?
• What makes data journalism different to the rest of
journalism? Perhaps it is the new possibilities that open up
when you combine the traditional ‘nose for news’ and ability
to tell a compelling story, with the sheer scale and range of
digital information now available
• Data journalism can help a journalist tell a complex story
through engaging infographics.
• Getting started with data journalism does not require huge
resources.
5. PROCESSES IN DATA JOURNALISM
• 1. Finding data
expert knowledge, contacts and databases
Ability to use computer assisted reporting skills, technical skills such as
MySQL or Python in data gathering.
• 2. Interrogating data
A good understanding of jargon and the wider context within which data
sits, statistics, - a familiarity with requisite tools.
• 3. Cleaning Data
Raw data can be difficult to make meaning of, therefore you need to
process data set into a format which is easily analysed.
6. Processes
4. Getting The Story Out of Data
The most useful output of analysis is getting a story out of the data through
statistical or mathematical processing.
5. Visualising data
At this stage, data is converted into information through the use of graphics,
pictures, diagrams and charts.
6. Mashing data
The output at this stage is integrated with web page or complementary
elements from two or more sources. Mash-ups are often created by using a
development approach called Ajax.
7. Publish, Distribute, Measure:
11. DATA JOURNALISM TOOLS
1. The spreadsheet
Almost every data journalist begins with the spreadsheet.
2. SQL
SQL allows you to describe exactly the subset of data you want to extract or the exact
changes you want to make, and it allows you to perform these queries across related
data sets.
3. Data cleaning tools
Most data sets are “dirty.” To clean the data and get it into a useful format, you will
need a variety of tools including Google Refine.
4. Visualization tools
A good visualization will allow you to see outliers and trends in ways that can
profoundly alter your understanding of the data. A couple of Web-based visualization
tools which are used in data journalism Google Fusion Tables and Tableau Public.
12. DATA JOURNALISM TOOLS
5. Mapping software
Google Fusion tables and Tableau Public both include quick and intuitive mapping
capabilities.
6. Scripting language
Python and Ruby seem to be the current favourites among journalists.
7. Web framework
A framework will keep the boring, repetitive work out of your way, help you adopt
best practices, keep you organized and make it easier to collaborate with others.
8. A Flexible editor
To write code, you need a code editor
9. Revision control
10. Document analysis tools
13. HOW TO BECOME A DATA JOURNALIST
• Journalists have to balance their role in responding to events
with their role as an active seeker of stories - and data is no
different. The New York Times' Aron Pilhofer recommends that
you "Start small, and start with something you already know
and already do. And always, always, always remember that the
goal here is journalism." It is better to find a story that will be
best told through numbers.
• There is no shortage of data being released that you can get
your journalistic teeth into. The open data movement in Ghana
and internationally is seeing a continual release of newsworthy
data, and it's relatively easy to find datasets being released by
MDAs.
14. HOW TO BECOME A DATA JOURNALIST
• A second approach, however, is to start with a question - "Do speed cameras cost
or save money?" for example, and then search for the data that might answer it.
• Whichever approach you take, it's likely that the real work will lie in finding the
further bits of information and data to fill out the picture you're trying to
clarify. Government data, for example, will often come littered with jargon and
codes you'll need to understand. A call to the relevant organisation can shed some
light. If that's taking too long, an advanced search for one of the more obscure
codes can help too.
• You'll also need to contextualise the initial data with further data. Say you have
some information about a government department's changing wage bill, for
example: has the department workforce expanded? How does it compare to other
government departments? What about wider wages within the industry? What
about inflation and changes in the cost of living? This context can make a
difference between missing and spotting a story.
15. HOW TO BECOME A DATA JOURNALIST
• Quite often your data will need cleaning up: look out for different names for the
same thing, spelling and punctuation errors, poorly formatted fields (e.g. dates that
are formatted as text), incorrectly entered data and information that is missing
entirely. Tools like Freebase Gridworks can help here.
• At other times the dataset you need will come in an inconvenient format, such as a
PDF, Powerpoint, or a rather ugly webpage. If you're lucky, you may be able to copy
and paste the data into a spreadsheet. But you won't always be lucky.
• At these moments some programming knowledge comes in handy. There's a sliding
scale here: at one end are those who can write scripts from scratch that scrape a
webpage and store the information in a spreadsheet. Alternatively, you can use a
website like Scraperwiki which already has example scripts that you can customise
to your ends - and a community to help. Then there are online tools like Yahoo!
Pipes and the Firefox plugin OutWit Hub. If the data is in a HTML table you can
evenwrite a one-line formula in Google Spreadsheets to pull it in. Failing all the
above, you might just have to record it by hand - but whatever you do, make sure
you publish your spreadsheet online and blog about it so others don't have to
repeat your hard work.
16. HOW TO BECOME A DATA JOURNALIST
• Once you have the data you need to tell the story, you need to get it ready
to visualise. Trim off everything peripheral to what you need in order to
visualise your story. There are dozens of free online tools you can use to
do this. ManyEyes and Tableau Public are good places to start for charts.
• Play around. If you're good with a graphics package, try making the
visualisation clearer through colour and labelling. And always include a
piece of text giving a link to the data and its source - because infographics
tend to become separated from their original context as they make their
way around the web.
• For maps, the wonderful OpenHeatMap is very easy to use - as long as
your data is categorised by country, local authority, constituency, region or
county. Or you can use Yahoo! Pipes to map the points of interest. Both of
these are actually examples of mashups.
17. HOW TO BECOME A DATA JOURNALIST
• Data literacy includes statistical literacy but
also understanding how to work with large
data sets, how they were produced, how to
connect various data sets and how to interpret
them.
• You can easily become a seasoned data-
journalist by asking 3 very simple questions.
18. HOW TO BECOME A DATA JOURNALIST
1. You need to ask your self: How was the data collected?
• When in doubt about a number’s credibility, always double check
2. The second question you need to ask yourself is: What’s in there to learn?
• Always take the distribution and base rate into account. Checking for the mean
and median, as well as mode (the most frequent value in the distribution) helps
you gain insights in the data. Knowing the order of magnitude makes
contextualization easier. Finally, reporting in natural frequencies (1 in 100) is way
easier for readers to understand that using percentage (1%).
19. BENEFITS OF USING DATA
JOURNALISM SKILLS
• It helps you find and write stories from the huge amount of
data released by governments every day
• It helps you find the strongest stories more quickly.
• Data journalism can help a journalist tell a complex story
through engaging visualizations.
• It can help explain why and how a story relates to the
individual
• Data journalism helps open up the news gathering process
itself