This document summarizes a presentation on data visualization. It introduces data visualization and its uses for exploring data, explaining results, and distant reading. It discusses the building blocks of visualization like charts, networks, and visualizing different data types. It explores some scholarly visualizations and exercises critiquing them. It also covers extracting data from text, images and video using computational methods, and preparing messy humanities data for visualization, including dealing with uncertainty. The presentation emphasizes choosing visualizations based on purpose, data, audience and structure. It recommends tools for creating simple visualizations like Viewshare that don't require programming.
1. Beyond the Black Box:
Data Visualisation
Dr Mia Ridge, @mia_out
Digital Curator, British Library
Beyond the Black Box, University of Edinburgh, February 2017
2. Before we get started
• Go to http://viewshare.org/ and sign up for an
account
• Raise your hand or ask your neighbour for
help if you get stuck
3. Overview
• Foundations of data visualisation
– What is data visualisation and why use it?
– The building blocks of visualisation
– Create simple visualisations in Viewshare
• Exploring and critiquing interactive, scholarly
visualisations
• What happens in the black box?
– Explore different algorithms for entity recognition
for images
5. Visualisation is the graphical display of
quantitative or qualitative information to create
insights by highlighting patterns, trends,
variations and anomalies.
9. Why visualise information?
For 'sense-making (also called data analysis) and
communication' (Stephen Few)
'…showing quantitative and qualitative information
so that a viewer can see patterns, trends, or
anomalies, constancy or variation' (Michael
Friendly)
'…interactive, visual representations of abstract
data to amplify cognition' (Card et al)
'Distant reading' (Moretti) - focus on the shape
rather than detail of a collection
11. Introductions
• In a sentence or two, what's your interest in
data visualisation?
– What kinds of data do you work with?
– Who or what visualisations you're creating be for?
17. Charles Minard's figurative map, 1869
'Figurative Map of the successive losses in men of the French Army in the Russian campaign
1812-1813'. Drawn up by M. Minard, Inspector General of Bridges and Roads in retirement.
Paris, November 20, 1869.
18. Web 2.0 and the mashup, 2006
http://www.bombsight.org
25. Networks
Every point on this diagram represents a male film producer. The pink dots represent men who worked exclusively with other men in the period
surveyed, and the green dots represent those who worked with women.
https://theconversation.com/women-arent-the-problem-in-the-film-industry-men-are-68740 Deb Verhoeven and Stuart Palmer
27. Visualising images and video
http://www.flickr.com/photos/culturevis/5883371358/
'Mondrian vs. Rothko', Lev Manovich, 2010. Image preparation: Xiaoda Wang
32. Scholarly data visualisations
• Exploring or explaining datasets / arguments
• Sometimes 'distant reading' - providing sense
of overall connection, patterns by pulling back
from detail, close reading (Moretti, Stanford
LitLab)
• Inspiring curiosity and research questions
• But - which questions do they privilege and
what do they leave out?
33. Exercise: critiquing scholarly visualisations
Go to http://bit.ly/2lHMyQB and follow the
steps for Exercise 1
Pair up and discuss together before reporting
back.
49. From the data you have to the
visualisation you want
50. How do you get data to visualise?
• Make it
– Mark up text or type/copy data into a structured
format
• Automate it
– Extract it from text, images, audio or video
• Find it
– Lots of freely available data to practice with or
check and enhance
56. Computational data generation
• Generate data from attributes of text, images,
etc
• Allows visualisation at scale
• Can be used in conjunction with manual
methods
• Tools often require calibration or 'training'
63. Exercise: Explore computational data
generation and entity extraction
Go to http://bit.ly/2lHMyQB and follow the
steps for Exercise 2:
1. Find a sample image
2. Load it onto the listed browser-based tools
3. Review and discuss the outputs
64. Exercise: learning about black boxes
• What attributes does each tool report on? Which attributes, if any, were
unique to a service?
• Based on this, what do each vendor seem to think is important to them (or
to their users)?
• How many possible entities (e.g. concepts, people, places, events,
references to time or dates) did it pick up?
• Is any of the information presented useful? Did it label anything
incorrectly?
• What options for exporting or saving the results do the demo or full
service offer?
• For tools with configuration options - what could you configure? What
difference did changing classifiers or other parameters make?
• If you tried it with a few images, did it do better with some than others?
Why might that be?
66. Considerations for humanities data
Commercial tools often assume complete, born-
digital datasets
• Historical records often contain uncertainty
and fuzziness (e.g. date ranges, multiple
values, uncertain or unavailable information;
data entry standards change over time)
• Humanities data often multi-layered, multi-
relational
• 'Data' = metadata, data, digital surrogates,
born-digital items
67. Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
69. Preparing data for visualisations
Historical data often needs manual cleaning to:
remove rows where vital information is missing
tidy inconsistencies in term lists or spelling
convert words to numbers (e.g. dates)
remove hard returns and non-ASCII characters (or
change data format)
split multiple values in one field into other
columns (e.g. author name, date in single field)
expand coded values (e.g. countries, language)
72. Data Preparation
• Generally needs to be in tables, one row per
item, one column per value. One bit of data
per cell!
• Decide on aggregate or individual values -
might need to calculate totals in advance
• Data should be made as consistent as possible
with tools like Excel, OpenRefine
73. Viewshare's advice on
spreadsheets
• Remove any data that is not in a solid rectangular area.
This includes white space, page titles, scattered cells,
and additional worksheets.
• Check that your formatting is consistent throughout
each column (e.g. column is all in date format, currency
format, etc. as appropriate).
• Make sure that data of the same type but in different
columns is formatted consistently (e.g. dates in
different columns are in the same date format).
79. Publishing visualisations
• How can you contextualise, explain any
limitations of your visualisations? e.g.
– provenance and qualities of original dataset;
– what you needed to do to it to get it into software
(how transformed, how cleaned);
– what's left out of the visualisation, and why?
82. Purpose, data, audience, structure
• Intersections of format and purpose
• Data types: quantitative, qualitative,
geographic, time series, media, entities
(people, places, events, concepts, things)
• How clean are your sources? How much time
do you have?
83. Key format decisions
• Print or digital?
• Static or interactive?
• Narrative or 'factual'?
• Shape (distant view) or detail (close view)?
84. What do you want to do?
• See relationships between variables (data points)
• Compare sets of values
• Track change over time / distribution in space
• See the parts of a whole (composition)
85. Dealing with complex data
• Find a visualisation type that can harbour the
data in a meaningful way or reduce the data in
a meaningful way.
– e.g. go from individual values to distribution of
values
– e.g. introduce interaction: overview, zoom and
filter, details on demand (Ben Shneiderman)
86. If all else fails...
• Sketch out your visualisation on paper to test
it and work out what data is needed
• Iteration is key, and...
• Stubbornness is a virtue!
87. Practising with Viewshare
Browser-based - no need to install software
Supports a range of input formats, relatively
smart about processing it for you
Relatively easy to get started with maps,
timelines, charts
Interactive visualisations can be embedded in
web pages, can save images as screenshots for
print
Supported by heritage institution
88. Exercise: Create simple visualisations
with Viewshare
Go to http://bit.ly/2lHMyQB and follow the
steps for:
• Viewshare Exercise 1: Ten minute tutorial -
getting started with Viewshare
• Viewshare Exercise 2: Create new views and
widgets
90. Tools that don't require programming
• Excel
• Google Fusion Tables, Google Drive
• Viewshare
• Tableau Public
Directories listed at http://bit.ly/2lHMyQB
NB: be careful about sensitive data on cloud
platforms