Data Intro for Librarians: Data Carpentry Workshop eRA2017
1. eResearch Africa Conference
Library Carpentry Workshop
5 May 2017
Kayleigh Roos & Erika Mias
Digital Curation Officers, UCT Libraries
Isak van der Walt
Senior IT Consultant, UP DLS Strategic Innovation
4. ● Stickies
Problematic terms/concepts/phrases Resolved terms/concepts/phrases
● Helpers
○ Instructors around the room who are not presenting
● computers are stupid, can frustrate, and as you all have different machines it can be tricky
to resolve problems.
○ Be patient, step aside, take a gulp of air, and put you red sticky up!
Intro to Data
Help!
Library Carpentry Workshop
5 May 2017
Help!
All good
5. ● Video: James Baker
https://youtu.be/40GX3AwgREg
Intro to Data
What is Library Carpentry?
Library Carpentry Workshop
5 May 2017
8. What the data?
Jargon Busting
Library Carpentry Workshop
5 May 2017
source: XKCD https://xkcd.com/1146/
9. List of technical terms a.k.a Jargon
Jargon Busting
Library Carpentry Workshop
5 May 2017
● Conference program e.g.:
○ Research Data Management
○ Open Stack
○ Data Repository
○ etc
● Library Carpentry
○ OpenRefine
○ Github
○ Regular expressions
○ Python
○ etc.
● Other
○ “
○
15. Foundations
Library Carpentry Workshop
5 May 2017
THE COMPUTER IS STUPID
● computer only does what you tell it to. If it throws up an error it is often
not your fault, rather in most cases the computer has failed to interpret
what you mean because it can only work with what it knows
● if you find an error message frustrating, it isn’t the computer’s fault that it
is giving you an archaic and incomprehensible error message, you might
just need to re-think the way in which you asked it to do something
16. Foundations
Library Carpentry Workshop
5 May 2017
CARPENTRY: the skill to know which tool to use and which tools to learn when need be
● One of the fundamental theories of a carpenter is that they ‘know which tools to use’ to solve or work on
particular tasks
General consensus around the fact that so many Library processes could be automated
through the implementation of some simple programming skills
● Repetitive tasks + automation = time & effort saving
Automation
17. Foundations
Library Carpentry Workshop
5 May 2017
Automation
● Borrow, Borrow, and Borrow again;
● The correct language to learn is the one that
works in your local context
● Knowing (even a little) code helps you
evaluate projects that use code
● Automate to make the time to do something
else
18. Foundations
Library Carpentry Workshop
5 May 2017
Shortcuts
Keyboard shortcuts are your friend
● ctrl+s for save; ctrl+c for copy; ctrl+x for cut; ctrl+v for paste (cmd on Mac)
● alt+tab (windows) cmd+tab (mac) for moving between programmes
● ctrl+shift+tab (Windows) or ctrl+tab (Mac) for moving between browser tabs
… Your favourites?
19. Foundations
Library Carpentry Workshop
5 May 2017
Open formats
Plain text (or open) formats are your friend
● Why?
○ All computers can process them
○ Interoperability
● Types of open file formats?
○ .txt
○ .csv
○ .html
○ .xml
○ .jpg
20. File naming
Naming files sensible things is good for you and your computer!
• Three criteria to assist with naming files:
○ Organisation
○ Context
○ Consistency
• Elements to consider when naming files:
○ version numbers
○ creation / publication date
○ creator’s name / group name
○ content description
○ project number
• Always consider scalability when naming files
○ e.g. 001 vs 01
• Don’t
○ punctuation, or capital letters
○ use special characters or spaces
• Do
○ replace full-stops with underscores
○ replace spaces with dashes
○ keep to YYYY-MM-DD date format
○ keep file names relevant and as short as possible
Library Carpentry Workshop
5 May 2017
Foundations
http://theawkwardyeti.com/comic/misc/
21. File versioning
credit: PHD Comics
“Final”.doc http://phdcomics.com/comics.php?f=1531
Always record changes to your data files, even if it seems
unnecessary!
● Don’t use the word “final” - instead, number or date
versions
● Avoid using labels - eg. ‘draft’, ‘test’, ‘final’, ‘rev’, ‘corrected’,
etc
● Indicate major version changes with:
○ YYYY-MM-DD_Title_Author_V1
○ YYYY-MM-DD_Title_Author_V2
● Indicate minor version changes with:
○ YYYY-MM-DD_Title_Author_V1-1
○ YYYY-MM-DD_Title_Author_V1-2
Library Carpentry Workshop
5 May 2017
Foundations