DSpace & DuraCloud Integrations talk, as presented as part of the DuraCloud Workshop at Open Repositories 2011 on June 6, 2011.
More Information on work presented in these slides can be found at:
* https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DSpace & DuraCloud Integrations
1. Tim Donohue
DSpace + DuraCloud Integrations
DuraSpace
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
2. Basis for DSpace Integration
1. DSpace AIP Backup & Restore (1.7 +)
(Initial DuraCloud use case: Backup & Restore)
2. DSpace Curation Task System (1.7 +)
3. DSpace Replication Task Suite (1.8)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
3. Intro to Archival Info Pkgs (1.7+)
• Primary Use Cases
– Backup & Restore of DSpace Content
• All content or just partial (Community/Collection/Item)
– Migration/Export of DSpace Content
• All content or just partial (Community/Collection/Item)
– DuraCloud Integration
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
4. How to Backup DSpace (pre-1.7)
Database Assetstore Folder
Full Database Folder
Backup Backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
5. How to Restore All (pre-1.7)
Database Assetstore Folder
Full Database Folder
Backup Backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
6. How to Restore a Collection (pre-1.7)
Database Assetstore Folder
Temporary Temporary
Database Folder?
Full Database Folder
Backup Backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
7. How to Restore a Collection (pre-1.7)
Database Assetstore Folder
Temporary Temporary
Database Folder?
Full Database Folder
Backup Backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
8. Backup via Archival Info Pkgs
Package for each
Community,
Collection & Item
AIP backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
9. Restore All via Archival Info Pkgs
Package for each
Community,
Collection & Item
AIP backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
10. Restore a Collection via AIPs
1 2
Collection AIP
Items in Collection
AIP backup
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
11. What’s in an AIP?
Content Files
License or Logos Other Files
METS in Bundles
(DIM / MODS / PREMIS / (optional)
METSRights)
*Also a BagIt version in works
Archival Information Package (AIP)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
12. What’s in an AIP?
Related
Object
AIPs
Content Files
License or Logos Other Files
METS in Bundles
(DIM / MODS / PREMIS / (optional)
METSRights)
Descriptive Metadata: DIM & MODS
Tech/Preservation Metadata: PREMIS
Rights Metadata: METSRights
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
13. The “Site” AIP
Top-Level
Community
AIPs
METS
(DIM / MODS / PREMIS /
METSRights)
Special AIP for site-wide info/metadata:
(e.g. Group Memberships, EPeople)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
14. What can AIPs restore?
Restore All In-Archive Content (Files + Metadata)
Restore All People & Groups
Restore All Permissions / Access Rights
Restore Community / Collection Logos, Metadata,
Rights & Item Templates
Restore Community / Collection / Item Hierarchy
Restore In-Process / Incomplete Items
Restore Collection OAI-PMH/ORE Harvest Settings
Restore all configuration files (dspace.cfg, etc.)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
15. Migrate a Collection
One DSpace Install Another DSpace Install
2 1
Collection AIP
Items in Collection
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
16. Migrate Content
One DSpace Install
(Future work)
2 1
Collection AIP
Items in Collection
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
17. DuraCloud Integration (1.7.x)
[1] ./dspace packager -d
[2] java -jar synctool.jar
1
Package for each
Community,
Collection & Item
Local “Watch” Folder
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
18. DuraCloud Integration (1.7.x)
[1] java -jar retrievaltool.jar
[2] ./dspace packager -r
2
Package for each
Community,
Collection & Item
Local Folder
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
19. New: DSpace Replication Suite in 1.8
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
20. DSpace Curation System (1.7+)
• Enables a basic ‘microservices’ approach to
curating DSpace objects
• Anyone can build a task & share it.
• Currently tasks must be written in Java
– Working on JRuby & Jython integration (1.8?)
• “Frees” admin tasks from Command Line
– Can now run from Admin UI or CLI
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
21. DSpace Replication Suite (1.8)
• A set of curation tasks geared towards
‘replicating’ (backup/restore/audit) content
• “Wraps” 1.7 DSpace AIP Backup & Restore
Backup content to AIP (filesystem or DuraCloud)
Restore/Replace from AIP
Audit AIP (compare to DSpace content)
Basic IO Tracking of AIP Upload/Downloads
All replication tasks can be run via Admin UI
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
22. DuraCloud Integration (1.8.x)
“Replication Task Suite”:
• Suite of Curation Tasks
• One step Sync process
OR
Command line Curation Tools
• Via UI or CLI
Package for each
Community,
1
Collection & Item
1
Local Temp Folder
(Cache)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
23. DuraCloud Integration (1.8.x)
“Replication Task Suite”:
• One step Retrieve process
• Via UI or CLI
OR
Command line Curation Tools
• Also ‘auditing’ tools
Package for each
Community,
1
Collection & Item
1
Local Temp Folder
(Cache)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
24. DSpace Replication Suite Demo
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
25. Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
26. Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
27. Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
28. Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
29. Known Limitations
Cannot yet take advantage of DuraCloud
streaming capabilities (AIPs are zip files)
Cannot yet take advantage of DuraCloud
transformation services (AIPs are zip files)
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
30. Next Steps
• Working towards “unzipped” AIPs (1.8?)
– METS file & Content files stored in an AIP ‘folder’
but NOT zipped up
– Support for DuraCloud streaming, etc.
• DSpace UI Streaming Integration (@mire)
• ‘Auto-Sync’ options
– Updates in DSpace -> DuraCloud (queued?)
– Updates via DuraCloud services -> DSpace?
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
31. In Large Thanks to…
• MIT : Richard Rodgers & Wendy Bossons
– Developed Curation Task Framework
– Developed initial Replication Suite tasks
• @mire : Mark Diggory
– Look for @mire’s “Integrating DuraCloud Services
in DSpace” talk on Friday at 3:30pm
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org
32. For More Information
• Replication Task Suite:
– https://wiki.duraspace.org/display/DSPACE/Replic
ationTaskSuite
• AIP Backup & Restore:
– https://wiki.duraspace.org/display/DSDOC/AIP+Ba
ckup+and+Restore
• Curation Task System:
– https://wiki.duraspace.org/display/DSDOC/Curati
on+System
Licensed under Creative Commons Attribution-Share Alike 3.0 Unported License (CC BY-SA 3.0)
To request other use: info@duracloud.org