Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
20190711 dh-utrecht
1. The emerging paradigm of
Bibliographic Data Science
Presenter: Leo Lahti / Finland
Helsinki Computational History Group
Digital Humanities / DH2019 @ Utrecht
2. Bibliographic (meta)data
Catalogue Description Earliest 1500-1800 Total
FNB Finnish National
Bibliography (OPEN!)
1488 16 365 ~1 million
SNB Swedish National
Bibliography
1457 46 764 ~18 million
ESTC English Short Title
Catalogue
1473 479 790 ~0.5 million
HPBD Heritage of the Printed
Book Database (CERL)
1446 2 095 628 ~6 million
5. From library catalogues
to research & reports?
Research potential
Open
bibliographic
data science
ecosystem
Research cases
MARC_XML
COMHIS/estc-data-verified/estc-xml-raw
location
XML
dataType
British Library
dataSource
ESTC_data_raw
COMHIS/estc-data-verified/estc-csv-raw
location
csv
rds
dataType
actor_summaries
COMHIS/author_analysis
location
markdown
csv
dataType
R 3.5.1
languages
ESTC_actor_data_unified
COMHIS/estc-data-unified/estc-viaf
location
csv
dataType
VIAF_XML
http://viaf.org/viaf/data/
location
XML
dataType
viaf.org
dataSource
ECCO_data_raw
COMHIS/ecco-data-private/originals
location
json
dataType
XML_parser
COMHIS/MARCdata
repository
C++
languages
unify_physicalextent
COMHIS/estc-physical-extent
repository
R 3.5.1
languages
unify_langugage
COMHIS/estc-language
repository
R 3.5.1
languages
unify_actors
COMHIS/estc-viaf
repository
R
languages
unify_language
7. Manual curation & scalability by automation
Edition sheets, London 1637-1662
Fig: Iiro Tiihonen
Data: ESTC & Gant: Printers and Publishers in
London From 1637 to 1662: a Quantitative Approach
7/47
9. 9/47
Bibliographic
metadata
Full texts: books,
newspapers...
R for Data Science / H. WickhamSupporting data
(Open) bibliographic data science ecosystem
Source: Wikimedia Commons / Public domain
Transparent reporting and
communication were part
of academic culture since
the early days
10.
11. MikkoTolonen
LeoLahti
EetuMäkelä
TanjaSäily
JaniMarjanen
MarkHill
AliIjaz
SimonHengchen
VilleVaara
AnttiKanner
HegeRoivainen
RubenRos
IiroTiihonen
DH2019 Papers
SP-14 History & Historiographies Thu 9am-10:30 V. Vaara, MJ Hill, M Tolonen
Publishers, Printers and Booksellers - Implications of Properly
Structured Metadata for Digital History
LP-28 History & Historiographies Thu 2-3:30 E Mäkelä, M Tolonen, A Kanner
Charting the Material Development of Newspapers
SP-19 Cultural Heritage, Artifacts & Institutions Thu 2-3:30. A Ijaz, H
Roivainen, L Lahti
Analytical Edition Detection In Bibliographic Metadata
Now: V Vaara, A Ijaz, I Tiihonen, A Kanner, T Säily, L Lahti:
The Emerging Paradigm of Bibliographic Data Science
PS: Poster Session Thu 3:30-5:00 J Marjanen, H Roivainen:
Book Formats and Reading Habits in Early Modern Europe (poster)
LP-34 Cultural Heritage, Art/ifacts & Institutions Fri 11-12:30 M Hill, T Säily
Patterns of Early Modern Authorship: Metadata as Historical Record
LP-37 History & Historiographies Fri 11-12:30. S Hengchen, R Ros, J
Marjanen
A data-driven approach to the changing vocabulary of the ‘nation’ in
English, Dutch, Swedish and Finnish newspapers, 1750-1950