Presentation given at the 'Unlocking Sources: WW1 & Europeana' conference located at the Staatsbibliothek zu Berlin, Germany on 31st January 2014.
http://www.europeana-collections-1914-1918.eu/unlocking-sources/
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Wrapping and Unwrapping History: What’s Gained and What’s Lost
1. Wrapping and Unwrapping History:
What’s Gained and What’s Lost
Unlocking Sources: WW1 & Europeana
Staatsbibliothek zu Berlin, Germany. 31st January 2014
Adrian Stevenson
Senior Technical Innovations Coordinator
Mimas, University of Manchester, UK
@adrianstevenson
3. WW1 Discovery Project
• Proof-of-Concept illustrating
principles of the JISC Discovery
initiative
• Discovery about advocating
‘open’ and ‘aggregating’
• Make digital content more
discoverable by people and
machines
www.discovery.ac.uk
• Built WW1 aggregation API
and discovery layers
4.
5. What is an API?
• ‘Application Programming Interface’
• Allows machine readability of data
– Typically over the Web
• Provides access to content or functions for
other systems
• Many ways to do this – e.g.
– Google, Facebook, Flickr, twitter APIs ….
– OAI-PMH, Z39.50
– RDF - Linked Data, Semantic Web
5
6. WW1 Discovery: How?
• Aggregate and ‘wrap’
data from existing APIs
– NMM, V&A,
Europeana
• Help others with
example API – BL,
Welsh Voices, Postal
Museum
• Formats: SOLR, RSS,
OpenSearch, OAI-PMH,
CSV
18. Challenges
• Lack of APIs
• Difficulties merging data
– Varied content and formats
– APIs can change
– Relevance ranking dubious
• From Discovery ‘Technical Principles’ - “Discovery is distributed …
Discovery is concerned with a plethora of information resources and
services from a wide variety of sources and is prepared, where
appropriate, to deal with these in situ”
• Speed of API response
• Lack of content
– images
– geo-data and time data
• Content licenses not open
19. Contact
Adrian Stevenson
Mimas, University of Manchester, UK
adrian.stevenson@manchester.ac.uk
www.mimas.ac.uk
www.twitter.com/adrianstevenson
www.linkedin.com/in/adrianstevenson
www.slideshare.net/adrianstevenson
19
20. CC License
This presentation available under creative commons Non
Commercial-Share Alike:
http://creativecommons.org/licenses/by-nc/2.0/uk/
Editor's Notes
Today talk about WW1 DiscoveryThis page from blog and can get more info there.Not all that useful one linerWill explore
WW1 part of major JISC activity called Discovery. Has ceased under this name and lots changing at Jisc, but still discovery activities a major activity.Core of Discovery advocating open data & licenses – v sim to Europeana.Perhaps less familiar is aggregating data – how you bring resources together. We tried to use data where it sits rather than gathering.Have created overlay API – the wrapping based on subject of WW1Created two user interfaces – unwrap the mediated data
JISC Discovery vision doc.We have IWM and V&A.API in area 2 of the diagramIntefaces is layer. Meets still very common use case. Also mention the technical principles
Lots could say about APIs but will focus on the resource discovery aspects.About machine readability.Most info marked up for humans on the web. Very useful but has its limits.APIs allows machines to read effectively the same info.Usually over the web but doesn’t have to be .Many ways of doing this. API isn’t a standard tho standards so existEg. Twitter client such as tweetdeck, hootsute, janetter – use the APIIncreasing interest in linked data and the web in museums space. WW1 not linked data – mention Locah and linking lives.Some have more of an interoperability focus, some more proprietary.
More specifically aggregating data from APIsUsing APIs and helping institutions set up APIs. Phase 1 Kings College work identified institutions with good WW1 stuff. Unfortunately this work wasn’t focussed so much on technical provision.Only a few identified sources had APIs- V&A and NMM.IWM API under the radar.Also data from other aggregators such as Euroepana and Culture Grid – Great war Arhive and Euroepana 1914-1918. Picking certain things.Revised project plan and tried to help data sources.Optimistic plan didn’t work. Have taken data from and set up examples at Mimas. In addition, very few data source institutions have APIs.Aim was to take data in all sorts of formats. SOLR very poplular.Open search.
First version of API released November 2012. Have been many subsequent revisions and are almost there with it. Worked through last set of bus and fixes late January 2013.APIs line other search servies like google using query syntax.Available as XML and JSON.About 12 data sources
Now onto the demonstratorsWorked with 2 suppliersHome page hereKCL identified unexplored areas that used as themes.
Can drag it around –tries to present nice exploratory format
Tried to highlight visual side of thingsA challenge is that many things have no images
Some nice stuff from NMM. Can get usual things – description, larger image and click through to to site and see licening info
Worked with WAWWD. Also a has a search
Note this is where the crowdsourcing comes inCan tell your own story.Idea is to feed this info back to data sources
Click street view to get overlayview
Can place images
Also has a map view. Can pull around an click through for interesting stuff.
Challenges include the lack of APIs available to aggregate data. Data comes in all shapes and sizes. Tried to work live with data where it is – called federated searching – decided to have another go. Means completely up to date. Don’t have to manage data locally – less maintenance.Tech principles suggest using data in situ here.Merits of this is that data doesn’t get stale and that in principle shouldn’t have data maintenance issues centralisers have such as Archives Hub.Mimas also does lots of centralisation stuff so wanted to try a diff approachAlso most API suited to querying, not harvesting.No valid way to relevance rank the search results of the different data sources against each other,.Acknowledged that even when you do centralise and have a view of all the MD, still questionable how rank, as they come in all shapes, sizes, quality and degrees of sparcity or not as Europeana appear to have found.Of course, if not using API for cross searching, this may well not be a problem.APIs are not meant for aggregating.Historypin god for getting missing but no good flow for feeding this data back to institutions.