DANS, the Dutch Data Archiving and Networked Services provides facilities for the deposit and archiving of archaeological data and provide a Trusted Digital Repository. Challenges involved mass ingestion of datasets and making use of thesauri, data mining and Linked Open-Data techniques.
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
02 2019 caa_krakowvg
1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
My data manager is a robot!
Mass ingests and migrations & network integrations
Valentijn Gilissen, MA: Data Manager / Preservation Officer
April 2019, CAA, Krakow
2. Use-cases
• The SWORD-ingest of Dutch archaeological datasets by the network of governmental
depots into the central DANS hub.
• Mass migrations and transformations of archived data to new standards.
• The promotion and integration of local data from the Portable Antiquities of the
Netherlands (PAN) in an international network, making use of thesauri, data mining
and Linked Open-Data techniques.
“How is humanity saved if it's not
allowed to... evolve?”
--Ultron
Avengers: Age of Ultron.
Directed by Joss Whedon. Marvel Studios, 2015
To support the ingest and validation of
increasing volumes of data, the role of
the data manager will need to adapt.
--Valentron
3. Institute of
Dutch Academy
and Research
Funding
Organisation
(KNAW & NWO)
since 2005
First predecessor
dates back to
1964 (Steinmetz
Foundation),
Historical Data
Archive 1989
Mission: promote
and provide
permanent
access to digital
research
resources
https://dans.knaw.nl
Data Archiving and Networked Services
6. The e-Depot for Dutch Archaeology
>40.000
76%
Field drawings/GIS
Images
Publications
Data tables
Photographs
available without restrictions
archaeological datasets
8. • Mission to provide the designated community
with trustworthy long-term access to curated
digital resources
• Constant monitoring, planning and maintenance
• Knowledge of/measures against: threats and
risks within systems
• Regular checking and/or certification
• Certificates: 3 standards, 3 levels
What is a ‘Trusted Digital Repository’?
http://www.trusteddigitalrepository.eu
OAIS
(ISO 14721)
Trusted Digital
Repositories:
Attributes and
Responsibilities
TRAC
Audit and
Certification of
Trustworthy Digital
Repositories
(ISO 16363 )
Bodies Providing
Audit And
Certification
(ISO 16919 )
Formal
Certification
See http://wiki.digitalrepositoryauditandcertification.org and
http://www.alliancepermanentaccess.org/membership/member-resources/audit-and-certification
Standards will be available free from http://www.ccsds.org
trustworthiness of digital repositories using ISO
16363.
It covers principles needed to inspire
confidence that third party certification of the
management of the digital repository has been
performed with impartiality, competence,
responsibility, openness, confidentiality, and
responsiveness to complaints
Metrics concerning:
• Organizational Infrastructure
• e.g. The repository shall have a documented history of the
changes to its operations, procedures, software, and
hardware.
• Digital Object Management
• e.g. The repository shall have access to necessary tools
and resources to provide authoritative Representation
Information for all of the digital objects it contains.
• Infrastructure and Security Risk Management
• eg. The repository shall have procedures in place to
evaluate when changes are needed to current
software.
Basic
Certification
Data Seal of
Approval
Extended
Certification
EUROPEAN
FRAMEWORK FOR
AUDIT AND
CERTIFICATION OF
DIGITAL
REPOSITORIES
to be promoted by
the EU
Monitored self-
audit using DSA
metrics
Monitored self-audit using ISO 16363 (or
DIN31644 in Germany)
Audit by
external
auditors
9. Electronic Archiving SYstemEASY Register Log in
New deposit
BrowseAdvanced search
Search help
Search
Disclaimer
Legal information
Property Rights Statement
How to cite data
https://easy.dans.knaw.nl
CoreTrustSeal/ Nestor Seal 2016
12. Data-managing
• Check Dublin Core, edit/modify where necessary
• Assign project codes (if required)
• Download files, check for completeness / privacy-sensitive data
• Migrate files to preferred formats (if required/necessary)
• Modify directory structure (if necessary)
• Upload preferred formats
• Check individual file metadata, edit/modify if necessary
• Add individual file metadata
• Publish files (set visibility/accessibility rights)
• Create a ‘Jumpoff’ presentation page
• Check workflow
• Publish dataset
• Relate dataset to related datasets or web pages
• End administration
13. Case 1: I, Robot
The SWORD-ingest of Dutch archaeological datasets by the network of
governmental depots into the central DANS hub.
“I’d give you advice, but you wouldn’t listen. No one ever does.”
--Marvin the Paranoid Android
(Adams, Douglas, 1952-2001. The Hitchhiker's Guide to the Galaxy;
New York :Harmony Books, 1980. Print.)
Reality: guidance => monitoring => feedback => effect change
--Valentijn the Preservation Officer
16. Open Archival Information System
Persistent Identifier Citation
Front-office
Machine to Machine
SWORD
OAI-PMH
REST-API
P
R
O
D
U
C
E
R
C
O
N
S
U
M
E
R
17. Open Archival Information System
Persistent Identifier Citation
Front-office
Machine to Machine
SWORD
OAI-PMH
REST-API
P
R
O
D
U
C
E
R
C
O
N
S
U
M
E
R
18. Open Archival Information System
Persistent Identifier Citation
Front-office
Machine to Machine
SWORD
OAI-PMH
REST-API
P
R
O
D
U
C
E
R
C
O
N
S
U
M
E
R
19. Guides to Good Practice
Before depositing
Metadata
What DANS does
Legal aspects
Quoting data
https://dans.knaw.nl/en
Deposit => Read more about depositing data
File Formats
http://www.parthenos-project.eu/portal/policies_guidelines
Documentation
During depositing
After depositing
20. Case 2: Transformers!
Mass migrations and transformations of archived data to new standards.
“Upgrading is compulsory.”
--the Cybermen
Doctor Who, BBC Studios, 1963-2019
Reality: guiding => monitoring => migrating where relevant => update documents
--the Archiving staff (Trusted Digital Repositories)
22. Preferred Formats
Non-preferred format(s)
As a general guideline, DANS considers that the file
formats best suited for longtime preservation and
accessibility are file formats which:
-are commonly used
-have open specifications
-are independent of specific software, developers or
suppliers
23. Archaeological data deposited in EASY
Publications
CAD drawings/GIS maps
Field drawings (scans)
Data tables
(databases / spreadsheets)
Photographs
Reports
Vector Images
JPEG + TIFF
JPEG + TIFF
SVG
CSV
PDF/A
PDF/A
DXF R12 / MID+MIF
25. Mass migrations to Preferred Formats
File identification
(mediatype)
Selection filter:
visible files
Extraction from
archive (Python)
Checksum
validation Checksum
validation
Checksum
validation
Checksum
validation
Double conversion
(Python)
Adding
provenance
metadata to
file ID’s
Generatin
g logfiles
Archival
storage
26. Case 3: Automatic for the People
The promotion and integration of local data from the Portable Antiquities of
the Netherlands (PAN) in an international network, making use of thesauri,
data mining and Linked Open-Data techniques.
“I am fluent in over six million forms of communication.”
--Protocol droid C3PO
Star Wars: Episode VI -Return of the Jedi. Directed by Richard Marquand. Lucasfilm Ltd. LCC, 1983
Reality: mapping metadata => harvesting => adding sources => enable access
--Protocol-operating data manager V@L3NT1JN
31. General contact:
Info@DANS.KNAW.NL
Head Data Archive:
Hella.Hollander@DANS.KNAW.NL
Senior Data Steward / Preservation
Officer:
Valentijn.Gilissen@DANS.KNAW.NL
Watch our videos on YouTube:
https://www.youtube.com/user/
DANSDataArchiving
Thanks for listening!