ResourceSync was funded by the Sloan Foundation and JISC to devise a specification for synchronizing web resources between a source and destinations as the resources change over time. It uses a modular framework based on the Sitemap protocol to publish resource lists, change lists, and send change notifications, providing the minimal URI of changed resources as well as optional metadata and links. This allows destinations to perform baseline synchronization, incremental synchronization as resources are modified, and audit their synchronization against the source.
1. ResourceSync was funded by the Sloan Foundation & JISC
A Modular Framework for Web-Based Resource Synchronization
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
http://www.openarchives.org/rs #resourcesync
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
ResourceSync
2. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
ResourceSync
• Collaboration between NISO and the Open Archives Initiative
• Funded by the Sloan Foundation and JISC
• Goal: Devise a specification for web-based resource
synchronization
3. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
This ResourceSync Presentation
• Problem Domain
• Scope
• Framework - Overview
• Framework – Technology
• Demonstration
• Status
4. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Background - OAI-PMH
• Recurrent metadata exchange
from a Data Provider to Service
Providers
• XML metadata only
• Repository centric
• Devised 1999-2002, prior to
REST, prior to dominance of
web search engines
5. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Revisit the Problem Domain - ResourceSync
• Synchronization of resources
from a Source to Destinations
• Web resources, anything with
an HTTP URI & representation
• Resource centric
• Devised 2012-2013, leverages
key ingredients of web
interoperability, existing
specifications, existing Search
Engine Optimization practice
6. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Problem Statement
• Consideration:
• Source (server) A has resources that change over time: they
get created, modified, deleted
• Destination (servers) X, Y, and Z leverage (some)
resources of Source A
• Problem:
• Destinations want to keep in step with the resource changes
at Source A
14. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Problem Statement
• Consideration:
• Source (server) A has resources that change over time: they
get created, modified, deleted
• Destination (servers) X, Y, and Z leverage (some)
resources of Source A
• Problem:
• Destinations want to keep in step with the resource changes
at Source A
• Goal:
• Design an approach for resource synchronization aligned
with the Web Architecture that has a fair chance of adoption
by different communities
15. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
This ResourceSync Presentation
• Problem Domain
• Scope
• Framework - Overview
• Framework – Technology
• Demonstration
• Status
16. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Scope – Collection Size
• Size of a Source’s resource collection:
• A few resources - small web sites, repositories
• Millions of resources – large repositories, datasets, linked
data collections
17. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Scope – Change Frequency
• Change frequency of a Source’s resources:
• Low – daily, weekly, monthly
• High – seconds, minutes
18. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Scope – Synchronization Latency
• Destination’s requirements regarding synchronization latency:
• High latency acceptable
• Low latency essential
19. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Scope – Collection Coverage
• Destination’s requirements regarding the coverage of a Source’s
resources:
• Partial coverage of the Source’s resources acceptable
• Full coverage of the Source’s resources verifiable
20. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Scope – Bitstream Accuracy
• Destination’s requirements regarding bitstream accuracy:
• Unverifiable bitstream accuracy acceptable
• Verifiable bitstream coverage essential
26. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
This ResourceSync Presentation
• Problem Domain
• Scope
• Framework - Overview
• Framework – Technology
• Demonstration
• Status
27. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
A Source’s Resources Evolve over Time
28. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Solution Perspective - Destination
• Destination needs regarding synchronization:
• Baseline synchronization: Initial catch-up operation to
align with the Source’s resources
• Incremental synchronization: Remain synchronized as
the Source’s resources evolve
• Audit: Destination determines whether it effectively is in
sync with the Source
- Bitstream accuracy
- Coverage of resources
29. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Solution Perspective - Source
• Source communicates about the state of its resources:
• Publish inventory: snapshot of the state of resources at a
moment in time
• Publish changes: enumeration of resource changes that
occurred during a temporal interval
• Notify about changes: send notifications as changes
occur
• Communication payload:
• Minimal, e.g. HTTP URI of resource
• Additional, e.g. content-based hash of resource
30. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Resource List
• In order to meet a Destination’s need for baseline
synchronization, the Source may publish a Resource List
• A Resource List is an inventory, a snapshot of existing
resources
• Per resource, it minimally provides the resource’s URI
• Process:
- Destination obtains the Resource List
- Destination obtains listed resources by their URI
- Optimization: Resource Dump, a list pointing to ZIP files
that contain resource representations
32. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Change List
• In order to meet a Destination’s need for incremental
synchronization, the Source may publish a Change List
• A Change List enumerates resource change events that
occurred in a temporal interval
• For each event, it minimally lists datetime, URI of the
resource, the nature of the change
• Process:
- Destination obtains the Change List
- Destination obtains created/updated resources, removes
deleted resources
- Optimization: Change Dump
33. Publish Change List: Resource Changes During Interval Ty-Tz
Change List [Ty,Tz] = { A updated @Tc ; B updated @Tc ;
C created @Td ; D deleted @Te ; C updated @Tf }
34. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Change Notification
• In order to meet a Destination’s need for incremental
synchronization and low latency, the Source may send Change
Notifications
• A Change Notification conveys resource change events as
they occur
• For each event, it minimally lists datetime, URI of the
resource, the nature of the change
- Process:
- Destination receives Change Notification
- Destination obtains created/updated resources, removes
deleted resources
35. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Send Change Notification – Resource Changes at Ta
Change Notification @Ta = { A updated @Ta }
36. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Send Change Notification – Resource Changes at Tb
Change Notification @Tb = { D updated @Tb }
37. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Send Change Notification – Resource Changes at Tc
Change Notification @Tc = { A updated @Tc ; B updated @Tc }
38. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Send Change Notification – Resource Changes at Td
Change Notification @Td = { C created @Td }
39. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Send Change Notification – Resource Changes at Te
Change Notification @Te = { D deleted @Te }
40. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Send Change Notification – Resource Changes at Tf
Change Notification @Tf = { C updated @Tf }
41. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Communication Payload – Metadata & Links
• A Source may provide additional metadata and links pertaining
to resources conveyed in Resource Lists, Change Lists, Change
Notifications
• Metadata about a resource: content encoding, content
length, mime type, content-based hash
• Linking to related resources: mirror copies, alternate
representations, resource versions, diff between current and
previous version, metadata-to-content link, content-to-
metadata link, collection membership, etc.
42. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Communication Payload – Metadata – Hash
• In order to meet a Destination’s need for audit, the Source may
provide a content-based hash pertaining to a resource
• Source computes the content-based hash for a resource
• Source provides the hash as metadata pertaining to the
resource in its communication payload
• Destination processes communication payload, obtains the
resource
• Destination computes the content-based hash for the
obtained resource, compares with the Source’s
43. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Communication Payload – Link – Interlink Metadata & Content
• In order to allow a Destination to establish the relationship
between a Source’s metadata and a Source’s content, the
Source may provide appropriate links
• Metadata resources and content resources are just
resources identified by HTTP URIs
• Both can independently be subject to synchronization and
can be interlinked using appropriately typed links
44. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Communication Payload – Link – Link to Diff
• In order to minimize content transfer, a Source may link to a diff
between the previous and the new version of a resource
• Destination can obtain the diff and patch its (previous)
version of the resource
• Connection between the resource and the diff is established
by means of appropriately typed link
• Nature of the diff is established by means of MIME type
- Few diff MIME types exist. Communities can establish
their own.
45. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Further Framework Characteristics
• Modular: A Source does not have to implement all capabilities
• Source decides which capabilities to support based on local
and community requirements
• Sets of Resources: Division of a Source’s resource collection in
logical groupings.
• Supported capabilities can differ per set
• Discovery: Mechanisms for Destinations to determine whether
and how a Source supports ResourceSync
• Based on conventions for web discovery and documents that
detail the level of support
46. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
This ResourceSync Presentation
• Problem Domain
• Scope
• Framework - Overview
• Framework – Technology
• Demonstration
• Status
47. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Sitemap Protocol
• ResourceSync builds on the Sitemap protocol used by major
search engines
• Similarity between resource synchronization and resource
discovery/indexing
• Extends the Sitemap protocol to meet synchronization needs
• Cf. Metadata and Links
• Sitemap document format is used throughout the framework
to express Resource Lists, Change Lists, etc.
• Type of ResourceSync document can be determined
through explicit declaration
48. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Common Sitemap
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T14:00:00Z</lastmod>
</url>
…
</urlset>
49. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Resource List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability="resourcelist"
at="2013-01-03T09:00:00Z” />
<url>
<loc>http://example.com/res1</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
length="8876"
type=”application/pdf” />
</url>
<url>
…
</url>
</urlset>
50. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Change List
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:md capability=”changelist"
from="2013-01-02T09:00:00Z”
until="2013-01-03T09:00:00Z” />
<url>
<loc>http://example.com/res2</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md change=”updated” />
<rs:ln href=“http://example.com/res2/meta”
rel=“describedby” />
</url>
<url>
…
</url>
</urlset>
51. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Change Notification
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<url>
<loc>http://example.com/res3</loc>
<lastmod>2013-01-02T13:00:00Z</lastmod>
<rs:md change=”updated”
type=”application/json” />
<rs:ln href=“http://example.com/res3/diff”
rel=“http://www.openarchives.org/rs/terms/patch”
type=“application/json-patch”/>
</url>
<url>
…
</url>
</urlset>
52. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
PubSubHubbub Protocol
• ResourceSync builds on the PuSH protocol used for syndication
of Atom/RSS feeds
• Introduces a novel use for pushing change notifications from
a Source to subscribing Destinations
• Destinations subscribe to a Source’s change notifications via
a Hub
• The Source pushes change notifications out to the Hub
• The Hub relays change notifications from the Source to the
subscribing Destinations
57. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
This ResourceSync Presentation
• Problem Domain
• Scope
• Example Use Cases
• Framework Characteristics
• Demonstration
• Status
58. Source Sends Change Notifications
Screencam of demo at https://www.youtube.com/watch?v=H2Le9_Bbkdw
59. Destination Acts Upon Change Notifications
Screencam of demo at https://www.youtube.com/watch?v=H2Le9_Bbkdw
60. Observe the State of Source and Destination
Screencam of demo at https://www.youtube.com/watch?v=H2Le9_Bbkdw
61. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
This ResourceSync Presentation
• Problem Domain
• Scope
• Example Use Cases
• Framework Characteristics
• Demonstration
• Status
62. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Status of Specifications
• ResourceSync Core Specification to be ANSI/NISO Z39.99-2014
by Summer 2014
• Spec on OAI web site fully aligned with NISO Standard
• ResourceSync Notification Specification currently in beta
• Stable
• Testing PubSubHubbub
• Hub, Source, Destination Python software to be released
• Public Hub for testing will be made available
• ResourceSync Archive Specification currently in beta
• Stable
63. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Pointers
• Specification
http://www.openarchives.org/rs/
http://www.openarchives.org/rs/resourcesync
http://www.openarchives.org/rs/notification
http://www.openarchives.org/rs/archives
• List for public comment
https://groups.google.com/d/forum/resourcesync
• Building blocks
http://sitemaps.org
https://code.google.com/p/pubsubhubbub/
64. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Papers
• Klein, M., and Van de Sompel, H. (2013) Extending Sitemaps for
Resourcesync. http://arxiv.org/abs/1305.4890 ACM/IEEE JCDL 2013
• Haslhofer, B., Warner, S, Lagoze, C., Klein, M., Sanderson, R., Nelson, M.L.
and Van de Sompel, H. (2013) ResourceSync: Leveraging Sitemaps for
Resource Synchronization. http://arxiv.org/abs/1305.1476 WWW 2013
Developer Track
• Klein, M., Sanderson, R., Van de Sompel, H., Warner, S, Haslhofer, B.,
Lagoze, C., and Nelson, M.L. (2013) A Technical Framework for Resource
Synchronization. http://dx.doi.org/10.1045/january2013-klein D-Lib
Magazine.
• Van de Sompel, H., Sanderson, R., Klein, M., Nelson, M.L., Haslhofer, B.,
Warner, S, and Lagoze, C. (2012) A Perspective on Resource
Synchronization. http://dx.doi.org/10.1045/september2012-vandesompel D-
Lib Magazine.
65. Herbert Van de Sompel, Martin Klein - ResourceSync
CNI Spring 2014, St. Louis, MO, March 31 2014
Acknowledgments
• Editors: Martin Klein, Robert Sanderson, Herbert Van de Sompel (Los
Alamos National Laboratory), Simeon Warner (Cornell University), Graham
Klyne ( University of Oxford), Bernhard Haslhofer (University of Vienna),
Michael L. Nelson (Old Dominion University), Carl Lagoze ( University of
Michigan)
• Contributors: Peter Murray (Lyrasis), Todd Carpenter, Nettie Lagace (NISO),
Richard Jones (Cottage Labs), Stuart Lewis (University of Edinburgh), Paul
Walk (University of Edinburgh), Jeff Young (OCLC), Shlomo Sanders (Ex
Libris), Kevin Ford (Library of Congress)
• Demonstration: Martin Klein, Harihar Shankar, Herbert Van de Sompel (Los
Alamos National Laboratory)
66. ResourceSync was funded by the Sloan Foundation & JISC
A Modular Framework for Web-Based Resource Synchronization
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
http://www.openarchives.org/rs #resourcesync
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
ResourceSync