Visit to a blind student's school🧑🦯🧑🦯(community medicine)
ResourceSync in 24x7
1. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Synchronize your
resources with
ResourceSync
Simeon Warner
(Cornell University Library)
1
2. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 2
Team sport
3. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 3
more, still more missing
JISC
Richard Jones
Graham Klyne
Stuart Lewis
OCLC
Jeff Young
LOCKSS
David Rosenthal
RedHat
Christian Sadilek
Ex Libris Inc.
Shlomo Sanders
Library of Congress
Kevin Ford
4. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 4
Alfred P. Sloan
Foundation
5. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Synchronize
• keep “in sync” (colloq.)
• Following changes over time
and
• Keeping copies on different systems the same
• Tackle only the unidirectional problem:
From a Source, to a Destination
5
6. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Resources
aka Web Resources:
have URI, HTTP GET representation(s)
Many / Few
Big / Small
Fast / Slow
6
8. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Scholarly repositories
• Replicate data/articles for mirroring, reuse,
indexing, ...
• OAI-PMH for metadata
• Many custom solutions
for full content
8
9. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Linked data
Fundamentally distributed but local copy often
required. Either:
1. cache
2. sync local copy...
• Many custom solutions
for local copy
9
Last.FM
MusicBrainz
GeoNames
DBpedia
others...
BBC
10. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Didn’t you sell us OAI-
PMH?
Or... will ResourceSync replace OAI-PMH?
Proven metadata transfer protocol
Widely adopted in our community
X Predates REST, not “of the web”
X Not adopted for content transfer
Can replace, likely coexistence
10
12. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
1. Baseline sync
Initial load, copy, or catch-up from source
• need list of all resources
• optional packaged content
Want to
• avoid out-of-band setup & customization
12
13. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
2. Incremental sync
Keep up-to-date with changes at a source
• need information about changes
• optional packaged content
• minimal primitives: create/update/delete
Want
• allow catch-up after destination offline
• lower latency and/or greater efficiency than
repeated baseline sync
13
14. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
3. Audit
Destination should be able to verify whether it is
synchronized with a source
• need list of all resources + fixity info
Want
• lower latency and/or greater efficiency than
baseline sync
• note: subject to some latency
14
17. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada 17
Minor?
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”
xmlns:rs="http://www.openarchives.org/rs/terms/”>
<rs:ln …/>
<rs:md …/>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:ln …/>
<rs:md …/>
</url>
<url>
…
</url>
</urlset>
18. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Baseline sync & Google
Most basic capability is Resource List:
• Snapshot of state of resources
• URI, datestamp + optional extra fixity info
• Destination does GET on each resource
ResourceSync Baseline sync & Audit
Google/Bing/Yahoo!/etc. harvest
18
20. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Extensible
Extensible use of Link Relations from Atom
• Spec describes use for mirrors, patches,
historical, provenance, conneg...
• Use <rs:ln rel=“your-relation-here” .../>
Extensible attributes for fixity etc.
• Includes lastmod, fixity, length, type...
Extensible framework -> new capabilities
20
21. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Push = Lower latency
Pull
• easy setup, no trust required
Push Changes
• lower latency, better scaling
• same descriptions as pull
• standard transports (XMPP, Websockets...)
• can push discovery info to trigger pull
21
22. “Synchronize your resources with ResourceSync”
July 10, 2013, Open Repositories 2013, PEI, Canada
Timeline
January 2013
June 2013
July 2013
Fall 2013
• Tools and libraries
being developed to
ease implementation
First beta
Version 0.9
Update and push spec
NISO standardization
• Tutorials at major
conferences (OAI8,
OR, JCDL,...)
22
0:00I will attempt, in the next 7minutes, to motive creation of the ResourceSync framework and explain what is means in a slightly less circular manner than the title. But first, I cannot claim that this is all my work...
0:17Core team comprises
0:34Technical committee
0:51and all this would not have been possible without funding for in-person meetings and some core team timeprimary funding from SloanUK participation funding from Jisc
1:08Let me pull apart the two words of the title and framework name
1:25ResourceSync is about Web Resources, things on the web with a URI identifier that can be derefenced to get one or more representations- the project is making and observation and a statement that repositories should exist really on the web- from 10s on a small website to 10s of millions in big repositories- large data resources, publications, linked data- changes multiple times per second to infrequent changes of archival records
1:42So far I’ve told you that a whole bunch of people are using up some generous funding to think about how to better synchronize web resources between systems. Why would we do this? What is the need? Going to give just two example use cases. More in Dlib article about a year ago.
1:59Many contexts when copies of resources in scholarly repositories are necessary. From one repo I’m involved with, arXiv.org, mirroring, copy for index, copy for researchCurrently either ad-hoc approaches or resort to the very blunt instrument of web crawling
2:16Ironic perhaps that while linked data is fundamentally distributed, many applications require local copies. Ad-hoc approaches to bulk copy
2:33OAI-PMH was introduced over 12 years ago (before the first JCDL, before OR was even imagined)
2:50Know why we need this new protocol, what should it do? Took a BIG step back to look at the fundamentals of the synchronization problem. We came up with the following 3 operations.
3:07Use Resource List or a Resource Dump which includes a Resource List as a manifest and the actual content
3:24
3:41
3:58So, we have three operations, how do these get implemented? What is the lowest barrier, most widely compatible, most performant, and most future proof way?Preferably inventing as little new stuff as possible.
4:15Do everything with sitemaps. Considered many options but sitemaps won because good match, wide adoption, simple, extensible. Minor extensions required.
4:32Yes, really minor. Two extra elements and attributes borrowed from several other specifications, notably Atom Link Extensions. In January the Sitemaps.org folks modified their schema to all the top level elements and this all ResourceSync documents are schema-valid sitemap (or sitemap index documents).
4:49Really cool thing about using sitemaps is that by implementing the most basic capability, the Resource List, you are also producing a sitemap that can be used by all the major search engines
5:06
5:23It is just possible that we haven’t thought of everything or got everything perfect. Three areas of extensibility: expression of relations between resources, expresssion of fixity and other information about resources, and at the framework level new capabilities can be added