SlideShare a Scribd company logo
1 of 33
Department of Internal Affairs
Time Traveling Analyst: The Things Only
a Time Machine Can Tell Me…
Ross Spencer - @beet_keeper
Archives New Zealand
#ARANZ2015
Tuesday September 7 2015
Department of Internal Affairs
Sun image, R24685027, E4, Archway,
Archives New Zealand.
http://www.archway.archives.govt.nz/ViewFullItem.do?
code=24685027&digital=yes
Department of Internal Affairs
Background
Two sets of born-digital ingest, Minister's Papers, 'code-named', E1
and E4, E2 and E3.
First sets selected for simplicity.
Second sets followed numerical sequence and were used as a
learning exercise.
Complexity grew.
First sets enabled creation of CSV ingest mechanism, configuration
of Rosetta, creation of process.
Second sets enabled the proof of that method.
Department of Internal Affairs
●
E1~
●
175 Files
●
10 Directories
●
0 Unidentified Objects
●
0 Unidentified Extensions
●
7 Known Formats
N.B. E4 also contained two
identification false positives.
●
E4~
●
1295 Files
●
6 Directories
●
2 Unidentified Objects
●
1 Unidentified Extensions
●
12 Known Formats
Approximate collection breakdowns at the
beginning of the process…
Approximate collection breakdowns at the
beginning of the process…
Department of Internal Affairs
Approximate collection breakdowns at the
beginning of the process…
• E2~
• 2519 Files
• 177 Directories
• 5 Unidentified Objects
• 4 Unidentified Extensions
• 22 Known Formats
• 25 Extension Mismatches
• E3~
• 1748 Files
• 144 Directories
• 8 Unidentified Objects
• 5 Unidentified Extensions
• 12 Known Formats
• 37 Extension Mismatches
N.B. Both collections
contained empty folders,
empty files, and multiple-id
formats.
Department of Internal Affairs
Let's begin with a story...
E1, the simplest... Enabled us to develop an ingest mechanism for
heterogeneous collections – and it worked!
E4, not that different, slightly larger, about as 'known', but!
An unexpected exception discovered in the relationship between
the preservation system and some of the filenames in the
collection...
Department of Internal Affairs
Where do astronauts go for a beer?
Department of Internal Affairs
The...
Department of Internal Affairs
We had filenames with multiple spaces in
them...
E.g. 'A [space] [space] Filename.docx'
An innocuous enough looking problem... Our digital
preservation system couldn't handle them...
Investigate the system...
...
Confirm it's the system...
…
Ask vendor to fix the problem...
…
No fix forthcoming for next release...
Department of Internal Affairs
What now...?
Change filenames?
...
Serious change, this is how we received them!
…
Record provenance...
…
Mechanisms in METS metadata schema [EVENT]
…
How to implement?
Department of Internal Affairs
We continue...
Configure CSV to handle EVENT fields...
...
Modify CSV generation tool to output blank EVENT fields...
…
Test ingest in system until configuration is perfected
…
Mechanism works so pre-condition filenames...
...
Record R-Numbers* and design provenance note controlled list...
…
Add data to CSV
…
DONE!!!!
*Dependency on listing being fixed in Archway
Department of Internal Affairs
Department of Internal Affairs
Test in digital preservation system fails...
...
UTF-8 character encoding...
…
How to preserve in Excel?
…
…
Import using special ribbon in Excel...
…
Add notes to sheet...
…
DONE?!
…
Not even now... >.<
Nope...
Department of Internal Affairs
It can become exhausting...
As a speaker! And for the audience!!! ^_^;
...Time and date based data becomes a problem...
...Asking non-expert users to do the same...
...Even power tools like Open Office suffer issues...
...E4 went in after solving the UTF-8 issues...
...E2 and E3 suffered from issues with time/date information on top
Department of Internal Affairs
But we learn and move onwards an
upwards...
Department of Internal Affairs
The work isn't straight-forward
● It Pushes out time-frames...
● And the problems we're solving aren't what we expected...
● We need to develop with the problem...
Department of Internal Affairs
But we have new tools...
Tools to create provenance information in CSV for ingest into the
digital preservation system.
Tools to identify files with this issue up front.
The digital preservation system is fixed, so this specific use-case
for us is unlikely to occur again.
We have gained new experience.
For E2 and E3, we created mechanisms of creating an ingest
'mash-up' using a separate provenance spreadsheet.
For our next ingest we have a macro to automate an Excel
import!!!!! ← IN MICROSOFT?!!!!
Department of Internal Affairs
We have what seems like an exhaust-less
list...
●
[Tools] Ability to handle multi-byte character encodings. Maori macrons,
‘Ā’, in DROID, digital preservation system, spreadsheets, etc. .
• [Tools] Unidentified files and false positives - contribute to
[Tools] Zero-byte files, empty folders
●
[Tools] System files
• [Tools] Digital preservation system’s capabilities; dates, delivery,
metadata extraction, etc.
• [Files] Invalid objects
• [Files] Templates, objects with auto-fields
Department of Internal Affairs
And we'd never have guessed these up
front...
● What are the next challenges?
● We'd be too conservative, or too O.T.T...
●WE NEED A TIME
MACHINE!!!
Department of Internal Affairs
Questions?
Department of Internal Affairs
We don't need a time machine at all...
● We need evidence!
● We need to practice!
● We need to do!
● Time-frames will be pushed out
● In a world that loves strategy, it's
terribly detail focused.
● Can someone figure it out first?
● Definition of Leadership!
● But you will almost certainly find
new exceptions... as will we.
Department of Internal Affairs
Ground process and policy in the real
world…
● We can reduce surprises...
● But we can't reduce them zero...
● Find the exceptions, create rules, and encode them
in those policies...
● Move one step at a time, with modes increments.
● Flexible endpoints / reasonable / multiple goals...
● Q. HOW DID WE GET THESE FILES??
● A. It doesn't matter, we have to deal with them...
Department of Internal Affairs
Evidence will…
● Inform policy
● Inform Procedures
➔ Tools
➔ Skills
➔ Appetite
➔ Strategy
Department of Internal Affairs
Writing these documents becomes a much
more advanced thought experiment with a
greater number of inputs from a greater
number of people, and experiences...
Department of Internal Affairs
Robustness Principle... (Postel's Law)
e.g. checksums
“Be conservative in what you do; be liberal in what you accept
from others.”
Follow standards... mechanisms should accept non-conforming
input as long as the meaning is clear...
Be prepared to understand material, be prepared to manage it.
A way of doing things... not the only way... WRITE OTHER
SOLUTIONS! RE-WRITE YOUR SOLUTIONS!
Department of Internal Affairs
Other tools for you...
DROID (National Archives UK):
http://www.nationalarchives.gov.uk/information-management/manage-information/policy-proce
Or Siegfried (State Records NSW): https://github.com/richardlehane/siegfried
DROID Analysis Tool: https://github.com/exponential-decay/droid-sqlite-analysis
Other presentations: http://www.slideshare.net/RossSpencer/presentations
Blogs (Open Preservation Foundation):
http://openpreservation.org/knowledge/blogs/
Record Keeping Tookit (Archives New Zealand):
http://www.records.archives.govt.nz/
Department of Internal Affairs
Share yours too!
Department of Internal Affairs
Who do digital preservation analysts
want to drink a beer with?
Department of Internal Affairs
Commander Hadfield!
https://twitter.com/cmdr_hadfield
TED:
What I learned from going blind in space?
Star Talk:
http://www.startalkradio.net/show/social-media-i
Department of Internal Affairs
It’s almost comical that astronauts are stereotyped as daredevils and
cowboys. As a rule, we’re highly methodical and detail-oriented. Our
passion isn’t for thrills but for the grindstone, and pressing our noses to
it. We have to: we’re responsible for equipment that has cost taxpayers
many millions of dollars, and the best insurance policy we have on our
lives is our own dedication to training. Studying, simulating, practicing
until responses become automatic—astronauts don’t do all this only to
fulfill NASA’s requirements. Training is something we do to reduce the
odds that we’ll die.”
 
― Chris Hadfield, An Astronaut's Guide to Life on Earth
The Right Stuff
Department of Internal Affairs
What next..?
Department of Internal Affairs
Questions!
Thank you!
Department of Internal Affairs

More Related Content

What's hot

Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataSteffen Staab
 
Semantically-Enabled Digital Investigations
Semantically-Enabled Digital InvestigationsSemantically-Enabled Digital Investigations
Semantically-Enabled Digital Investigationsinbroker
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
 
The Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow: A Case Study from SwedenThe Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow: A Case Study from SwedenMarcus Smith
 
Bigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesBigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesDiego Valerio Camarda
 

What's hot (6)

Information-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic DataInformation-Rich Programming in F# with Semantic Data
Information-Rich Programming in F# with Semantic Data
 
Semantically-Enabled Digital Investigations
Semantically-Enabled Digital InvestigationsSemantically-Enabled Digital Investigations
Semantically-Enabled Digital Investigations
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
Unlocking Doors: recent initiatives in open and linked data at the National L...
Unlocking Doors: recent initiatives in open and linked data at the National L...Unlocking Doors: recent initiatives in open and linked data at the National L...
Unlocking Doors: recent initiatives in open and linked data at the National L...
 
The Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow: A Case Study from SwedenThe Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow: A Case Study from Sweden
 
Bigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studiesBigdive 2014 - RDF, principles and case studies
Bigdive 2014 - RDF, principles and case studies
 

Similar to Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...

The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to ObservabilityEmily Nakashima
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015grecsl
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptRahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .pptGanesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptkalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 
Dmitry Lebedev: Agile Testing Using Agile Tools
Dmitry Lebedev: Agile Testing Using Agile ToolsDmitry Lebedev: Agile Testing Using Agile Tools
Dmitry Lebedev: Agile Testing Using Agile ToolsAgile Lietuva
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data lokku
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisAnton Chuvakin
 
High Performance and Scalability Database Design
High Performance and Scalability Database DesignHigh Performance and Scalability Database Design
High Performance and Scalability Database DesignTung Ns
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Pat Hermens
 
Digital Forensics
Digital ForensicsDigital Forensics
Digital ForensicsVikas Jain
 
New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionStefan Urbanek
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 

Similar to Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... (20)

The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
Defending the Enterprise with Evernote at SourceBoston on May 27, 2015
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Dmitry Lebedev: Agile Testing Using Agile Tools
Dmitry Lebedev: Agile Testing Using Agile ToolsDmitry Lebedev: Agile Testing Using Agile Tools
Dmitry Lebedev: Agile Testing Using Agile Tools
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data
 
Log Mining: Beyond Log Analysis
Log Mining: Beyond Log AnalysisLog Mining: Beyond Log Analysis
Log Mining: Beyond Log Analysis
 
High Performance and Scalability Database Design
High Performance and Scalability Database DesignHigh Performance and Scalability Database Design
High Performance and Scalability Database Design
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017
 
Digital Forensics
Digital ForensicsDigital Forensics
Digital Forensics
 
Ds
DsDs
Ds
 
New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introduction
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 

Recently uploaded

31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.Christina Parmionova
 
sponsor for poor old age person food.pdf
sponsor for poor old age person food.pdfsponsor for poor old age person food.pdf
sponsor for poor old age person food.pdfSERUDS INDIA
 
NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s educationSERUDS INDIA
 
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...Namrata Singh
 
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiVasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiPriya Reddy
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterChristina Parmionova
 
Election 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfElection 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfSamirsinh Parmar
 
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie WhitehouseTime, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie Whitehousesubs7
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlEdouardHusson
 
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfPeace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfNAP Global Network
 
31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.Christina Parmionova
 
3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.Christina Parmionova
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLSarandianics
 
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberCall Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberSareena Khatun
 
74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptxpiyushsinghrajput913
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32JSchaus & Associates
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxScottMeyers35
 

Recently uploaded (20)

31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.
 
sponsor for poor old age person food.pdf
sponsor for poor old age person food.pdfsponsor for poor old age person food.pdf
sponsor for poor old age person food.pdf
 
NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s education
 
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
 
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiVasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
 
Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - Poster
 
Election 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfElection 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdf
 
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie WhitehouseTime, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
 
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfPeace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
 
31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.
 
3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.
 
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition PlansSustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberCall Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
 
74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx
 
2024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 322024: The FAR, Federal Acquisition Regulations, Part 32
2024: The FAR, Federal Acquisition Regulations, Part 32
 
AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
 

Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...

  • 1. Department of Internal Affairs Time Traveling Analyst: The Things Only a Time Machine Can Tell Me… Ross Spencer - @beet_keeper Archives New Zealand #ARANZ2015 Tuesday September 7 2015
  • 2. Department of Internal Affairs Sun image, R24685027, E4, Archway, Archives New Zealand. http://www.archway.archives.govt.nz/ViewFullItem.do? code=24685027&digital=yes
  • 3. Department of Internal Affairs Background Two sets of born-digital ingest, Minister's Papers, 'code-named', E1 and E4, E2 and E3. First sets selected for simplicity. Second sets followed numerical sequence and were used as a learning exercise. Complexity grew. First sets enabled creation of CSV ingest mechanism, configuration of Rosetta, creation of process. Second sets enabled the proof of that method.
  • 4. Department of Internal Affairs ● E1~ ● 175 Files ● 10 Directories ● 0 Unidentified Objects ● 0 Unidentified Extensions ● 7 Known Formats N.B. E4 also contained two identification false positives. ● E4~ ● 1295 Files ● 6 Directories ● 2 Unidentified Objects ● 1 Unidentified Extensions ● 12 Known Formats Approximate collection breakdowns at the beginning of the process… Approximate collection breakdowns at the beginning of the process…
  • 5. Department of Internal Affairs Approximate collection breakdowns at the beginning of the process… • E2~ • 2519 Files • 177 Directories • 5 Unidentified Objects • 4 Unidentified Extensions • 22 Known Formats • 25 Extension Mismatches • E3~ • 1748 Files • 144 Directories • 8 Unidentified Objects • 5 Unidentified Extensions • 12 Known Formats • 37 Extension Mismatches N.B. Both collections contained empty folders, empty files, and multiple-id formats.
  • 6. Department of Internal Affairs Let's begin with a story... E1, the simplest... Enabled us to develop an ingest mechanism for heterogeneous collections – and it worked! E4, not that different, slightly larger, about as 'known', but! An unexpected exception discovered in the relationship between the preservation system and some of the filenames in the collection...
  • 7. Department of Internal Affairs Where do astronauts go for a beer?
  • 8. Department of Internal Affairs The...
  • 9. Department of Internal Affairs We had filenames with multiple spaces in them... E.g. 'A [space] [space] Filename.docx' An innocuous enough looking problem... Our digital preservation system couldn't handle them... Investigate the system... ... Confirm it's the system... … Ask vendor to fix the problem... … No fix forthcoming for next release...
  • 10. Department of Internal Affairs What now...? Change filenames? ... Serious change, this is how we received them! … Record provenance... … Mechanisms in METS metadata schema [EVENT] … How to implement?
  • 11. Department of Internal Affairs We continue... Configure CSV to handle EVENT fields... ... Modify CSV generation tool to output blank EVENT fields... … Test ingest in system until configuration is perfected … Mechanism works so pre-condition filenames... ... Record R-Numbers* and design provenance note controlled list... … Add data to CSV … DONE!!!! *Dependency on listing being fixed in Archway
  • 13. Department of Internal Affairs Test in digital preservation system fails... ... UTF-8 character encoding... … How to preserve in Excel? … … Import using special ribbon in Excel... … Add notes to sheet... … DONE?! … Not even now... >.< Nope...
  • 14. Department of Internal Affairs It can become exhausting... As a speaker! And for the audience!!! ^_^; ...Time and date based data becomes a problem... ...Asking non-expert users to do the same... ...Even power tools like Open Office suffer issues... ...E4 went in after solving the UTF-8 issues... ...E2 and E3 suffered from issues with time/date information on top
  • 15. Department of Internal Affairs But we learn and move onwards an upwards...
  • 16. Department of Internal Affairs The work isn't straight-forward ● It Pushes out time-frames... ● And the problems we're solving aren't what we expected... ● We need to develop with the problem...
  • 17. Department of Internal Affairs But we have new tools... Tools to create provenance information in CSV for ingest into the digital preservation system. Tools to identify files with this issue up front. The digital preservation system is fixed, so this specific use-case for us is unlikely to occur again. We have gained new experience. For E2 and E3, we created mechanisms of creating an ingest 'mash-up' using a separate provenance spreadsheet. For our next ingest we have a macro to automate an Excel import!!!!! ← IN MICROSOFT?!!!!
  • 18. Department of Internal Affairs We have what seems like an exhaust-less list... ● [Tools] Ability to handle multi-byte character encodings. Maori macrons, ‘Ā’, in DROID, digital preservation system, spreadsheets, etc. . • [Tools] Unidentified files and false positives - contribute to [Tools] Zero-byte files, empty folders ● [Tools] System files • [Tools] Digital preservation system’s capabilities; dates, delivery, metadata extraction, etc. • [Files] Invalid objects • [Files] Templates, objects with auto-fields
  • 19. Department of Internal Affairs And we'd never have guessed these up front... ● What are the next challenges? ● We'd be too conservative, or too O.T.T... ●WE NEED A TIME MACHINE!!!
  • 20. Department of Internal Affairs Questions?
  • 21. Department of Internal Affairs We don't need a time machine at all... ● We need evidence! ● We need to practice! ● We need to do! ● Time-frames will be pushed out ● In a world that loves strategy, it's terribly detail focused. ● Can someone figure it out first? ● Definition of Leadership! ● But you will almost certainly find new exceptions... as will we.
  • 22. Department of Internal Affairs Ground process and policy in the real world… ● We can reduce surprises... ● But we can't reduce them zero... ● Find the exceptions, create rules, and encode them in those policies... ● Move one step at a time, with modes increments. ● Flexible endpoints / reasonable / multiple goals... ● Q. HOW DID WE GET THESE FILES?? ● A. It doesn't matter, we have to deal with them...
  • 23. Department of Internal Affairs Evidence will… ● Inform policy ● Inform Procedures ➔ Tools ➔ Skills ➔ Appetite ➔ Strategy
  • 24. Department of Internal Affairs Writing these documents becomes a much more advanced thought experiment with a greater number of inputs from a greater number of people, and experiences...
  • 25. Department of Internal Affairs Robustness Principle... (Postel's Law) e.g. checksums “Be conservative in what you do; be liberal in what you accept from others.” Follow standards... mechanisms should accept non-conforming input as long as the meaning is clear... Be prepared to understand material, be prepared to manage it. A way of doing things... not the only way... WRITE OTHER SOLUTIONS! RE-WRITE YOUR SOLUTIONS!
  • 26. Department of Internal Affairs Other tools for you... DROID (National Archives UK): http://www.nationalarchives.gov.uk/information-management/manage-information/policy-proce Or Siegfried (State Records NSW): https://github.com/richardlehane/siegfried DROID Analysis Tool: https://github.com/exponential-decay/droid-sqlite-analysis Other presentations: http://www.slideshare.net/RossSpencer/presentations Blogs (Open Preservation Foundation): http://openpreservation.org/knowledge/blogs/ Record Keeping Tookit (Archives New Zealand): http://www.records.archives.govt.nz/
  • 27. Department of Internal Affairs Share yours too!
  • 28. Department of Internal Affairs Who do digital preservation analysts want to drink a beer with?
  • 29. Department of Internal Affairs Commander Hadfield! https://twitter.com/cmdr_hadfield TED: What I learned from going blind in space? Star Talk: http://www.startalkradio.net/show/social-media-i
  • 30. Department of Internal Affairs It’s almost comical that astronauts are stereotyped as daredevils and cowboys. As a rule, we’re highly methodical and detail-oriented. Our passion isn’t for thrills but for the grindstone, and pressing our noses to it. We have to: we’re responsible for equipment that has cost taxpayers many millions of dollars, and the best insurance policy we have on our lives is our own dedication to training. Studying, simulating, practicing until responses become automatic—astronauts don’t do all this only to fulfill NASA’s requirements. Training is something we do to reduce the odds that we’ll die.”   ― Chris Hadfield, An Astronaut's Guide to Life on Earth The Right Stuff
  • 31. Department of Internal Affairs What next..?
  • 32. Department of Internal Affairs Questions! Thank you!

Editor's Notes

  1. &amp;lt;number&amp;gt;
  2. &amp;lt;number&amp;gt;
  3. &amp;lt;number&amp;gt;
  4. &amp;lt;number&amp;gt;
  5. &amp;lt;number&amp;gt;
  6. &amp;lt;number&amp;gt;
  7. &amp;lt;number&amp;gt;
  8. &amp;lt;number&amp;gt;
  9. &amp;lt;number&amp;gt;
  10. &amp;lt;number&amp;gt;
  11. &amp;lt;number&amp;gt;
  12. &amp;lt;number&amp;gt;
  13. &amp;lt;number&amp;gt;
  14. &amp;lt;number&amp;gt;
  15. &amp;lt;number&amp;gt;
  16. &amp;lt;number&amp;gt;
  17. &amp;lt;number&amp;gt;
  18. &amp;lt;number&amp;gt;
  19. &amp;lt;number&amp;gt;
  20. &amp;lt;number&amp;gt;
  21. &amp;lt;number&amp;gt;
  22. &amp;lt;number&amp;gt;
  23. &amp;lt;number&amp;gt;
  24. &amp;lt;number&amp;gt;
  25. &amp;lt;number&amp;gt;
  26. &amp;lt;number&amp;gt;
  27. &amp;lt;number&amp;gt;
  28. &amp;lt;number&amp;gt;
  29. &amp;lt;number&amp;gt;
  30. &amp;lt;number&amp;gt;
  31. &amp;lt;number&amp;gt;