SlideShare a Scribd company logo
1 of 41
Download to read offline
Oxford Common File Layout
Rosalyn Metz (Emory),
Simeon Warner (Cornell)
Samvera Connect 2018
http://bit.ly/ocfl-samcon2018
Not just us...
OCFL Editorial Group
● Andrew Hankinson (Oxford)
● Neil Jefferies (Oxford)
● Julian Morley (Stanford)
● Andrew Woods (DuraSpace)
● and us (Rosalyn and Simeon)
Community input from pasig-discuss and
ocfl-community groups, and from others
Closest parents: BagIt & Moab
BagIt
Well established and implemented specification for handling sets of files
● Being formally standardized as RFC:
https://tools.ietf.org/html/draft-kunze-bagit-17
● Used for transfer and (somewhat less) for files at rest
● Good fixity support
● No explicit versioning support
○ Could use local conventions for version inside a bag
○ Could use bag-per-version
Moab: A Brief History
Slides adapted from Julian Morley's in the OR2018 OCFL presentation
● Moab is the closest ancestor of OCFL
● Developed at Stanford Libraries by Richard Anderson
○ Article: http://journal.code4lib.org/articles/8482
● Named after Moab, UT
Moab: A Brief History
● Moab is a versioned, forward delta file
structure that supports fixity and file
de-duplication.
● You can preserve anything with it (even
cat pictures found on the internet)
● The tools to manage and create Moabs
are open source Ruby gem
○ https://github.com/sul-dlss/moab-versioning
Moab is part of the
Stanford Digital Repository
Here be Moabs!
Moab in Practice @ Stanford
We have many Moabs in the SDR
● 1.6 million Moab objects
● 5 million version directories
● 50+ million files
● 500+ TB of data (25TB added last month)
● Spread across 15 NFS volumes on NetApp filers
● Backed up by IBM Spectrum Protect (formerly TSM)
○ 1 tape copy kept in local tape frame;1 sent to Iron Mountain
ab123cd4567
v0001
data
content
title.jpg
intro.jpg
page1.jpg
page2.jpg
page3.jpg
metadata
versionMetadata.xml
descMetadata.xml
identityMetadata.xml
manifests
versionInventory.xml
signatureCatalog.xml
versionAdditions.xml
fileInventoryDifference.xml
manifestInventory.xml
v0002
data
content
page2.jpg
metadata
versionMetadata.xml
technicalMetadata.xml
manifests
versionInventory.xml
signatureCatalog.xml
versionAdditions.xml
fileInventoryDifference.xml
manifestInventory.xml
two version directories; /v0001 & /v0002
A sample Moab object on disk
/data content comes from upstream and could
be anything, but our systems create data in
/content and /metadata directories.
/manifests directories are for Moab metadata.
This is where we store all the checksums and
change information for deduplication and
forward deltas.
Lessons from Cornell
CULAR @ 2017
It worked, what
now?
● Fedora 3 no longer being
developed, Fedora 4 not an
appropriate option
● Decision not to buy
"preservation services",
primarily on cost grounds
● Decision that we want one local
copy for legal access reasons
Short term ⇒ use local disk and
AWS S3. Build tools over
filesystem and object stores
Those files sure
are piling up!
Nearly 100TB now, planning
100TB/year digitization
● Plan to purchase a scalable
local (object) storage system
for 1 copy
● Two more copies in cloud
(perhaps tape)
● Content will outlast any
application or software system
● Content will outlast any storage
system
● Expect change and hence
migration ⇒ KISS
OCFL object
OCFL storage root(s)?
Shared Cornell and OCFL Goals
● Provide an application and vendor neutral storage arrangement that can be
used with filesystems and object stores
○ Allow easy replication between multiple storage environments
○ Allow easy migration between storage systems (modulo the inherent burdens)
○ Allow use with multiple and changing applications
● Support package versioning at low cost (complexity and storage use)
● Support internal package validation for completeness and fixity
● Support audit and self-description of entire store
● Have an easy migration path from current archival storage arrangements
● Develop a shared model that is useful at multiple institutions so that all benefit
from community developed tools and expertise.
Lessons from Emory
Lessons from Emory: Deliverables
Actively engaged in a multi-year effort to gather requirements, design, and
develop a digital repository based on the Samvera framework.
Selected deliverables included...
Develop object definitions/types (e.g.
collections, objects, other entities) and their
relationships to one another; determine
preservation objects inside and outside of
Fedora.
Identify needs for AIP structure.
Identify storage requirements (e.g. number of
copies, file access scenarios)
Lessons from Emory: Identified requirements
The means to distribute digital objects to third-party preservation services.
A well understood and well documented model for storing digital objects.
Ability to place multiple copies of digital objects into diverse storage services
(AWS, local storage, etc.).
Easily allow for fixity checking of digital objects.
Digital
Object
Content Files
(Primary or Supplemental)
Content file 1
Content file 2
Content file 3...
… + additional
… + additional
The content itself:
relationships provided in
structural metadata
Metadata (Actionable/Indexed)
Desc. metadata
Technical metadata (File-level)
Preservation Events/Audits
Administrative metadata
Structural metadata (PCDM)
Metadata converted to RDF
for Hyrax/Fedora - editable
and/or searchable
Supplemental Preservation Files
(Metadata/Administrative Files)
Source Metadata (binary file)
Desc. Metadata record (binary file)
METS (binary file)
License/agreement (binary file)
Supplemental PREMIS (binary file)
Variable supplemental info
stored as files (not directly
system-readable):
staff can view or download
file to read it
Collection
Ancient Egyptian
Collection
Administrative
Collection
Carlos Museum
Administrative
Collections reflect the
process the libraries
followed when deciding to
collect materials.
Digital Objects must be a
part of an Administrative
Collection and optionally in
one or more Collections
Digital Objects may
contain one or more files
Digital Objects,
Collections receive
Emory-defined metadata
and relationships
Major Emory
Entities PCDM
Context -
Simple Example
Individual Agreements
contain information about
the Administrative
Collection.
Individual Agreements
may contain one or more
files
Individual Agreements
are assigned to objects
through their parent
Collection
Is a member of
Is a member of
Individual Agreement
Carlos Museum
Agreement
Digital Object
Statuette of a Cat.
Collection
Divine Felines Exhibition
Is a member of
Is a member of
Goals of OCFL
OCFL Requirements
1) Completeness, so that a repository can be
rebuilt from the files it stores,
2) Parsability, both by humans and machines,
most importantly in the absence of original
software,
3) Robustness, against errors, corruption, and
migration between storage technologies, and
4) Storage, on a variety of infrastructures
including cloud object stores.
Many existing digital preservation
standards like:
● TDR (ISO 16363)
● OAIS (ISO 14721)
● NDSA Levels of Preservation
● BagIt
discuss the need for these
requirements, but none provided a
standardized way for how to do it.
OCFL the specification
https://ocfl.io/draft/spec/
OCFL Object
A group of one or more content files and
administrative information identified by a
URI.
The object may contain a sequence of versions
of the files organized into version directories.
The base directory of the object may contain a
logs directory.
A NAMASTE file indicating conformance.
An object contains an inventory digest file
which provides a digest for the
inventory.json file.
[object root]
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
├── v2
│ ├── foo
│ │ └── bar.xml
│ ├── inventory.json
│ └── inventory.json.sha512
└── v3
├── inventory.json
└── inventory.json.sha512
OCFL Object
An object contains an inventory.json file
which inventories the contents of an object.
The manifest block lists all the digests and
existing file paths for all of the object’s content.
The versions block identifies the logical file path
and the digest for each version of the object’s
content.
Separating the logical file path from the
existing file path and using digests to refer to
files allows for deduplication of content.
{
"head": "v3",
"id": "ark:/12345/bcd987",
"manifest": {
"4d27c8...b53": [ "v2/foo/bar.xml" ],
"7dcc35...c31": [ "v1/foo/bar.xml" ],
"cf83e1...a3e": [ "v1/empty.txt" ],
"ffccf6...62e": [ "v1/image.tiff" ]
},
"type": "Object",
"versions": [
{
"created": "2018-01-01T01:01:01Z",
"message": "Initial import",
"state": {
"7dcc35...c31": [ "foo/bar.xml" ],
"cf83e1...a3e": [ "empty.txt" ],
"ffccf6...62e": [ "image.tiff" ]
},
"type": "Version",
"user": {
"address": "alice@example.com",
"name": "Alice"
},
"version": "v1"
},
{
"created": "2018-02-02T02:02:02Z",
"message": "Fix bar.xml, remove image.tiff,
OCFL Storage Root
The base directory of an OCFL storage layout.
Should also contain the OCFL specification in
human-readable plain-text format.
Should contain the conformance declaration
OCFL Objects may conform to the same or
earlier version of the specification.
The storage hierarchy must terminate with an
OCFL Object Root.
[storage root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── ab12cd34
│ ├── 0=ocfl_object_1.0
│ ├── inventory.json
│ ├── inventory.json.sha512
│ └── v1
│ ├── file.txt
│ ├── inventory.json
│ └── inventory.json.sha512
└── ef56gh78
. ├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
└── v2
├── foo
│ └── bar.xml
├── inventory.json
└── inventory.json.sha512
OCFL Storage Root
Storage hierarchies must not include files
within intermediate directories
Storage hierarchies must be terminated by
OCFL Object Roots
Storage hierarchies within the same OCFL
Storage Root should use just one layout
pattern
Storage hierarchies within the same OCFL
Storage Root should consistently use either a
directory hierarchy of OCFL Objects or
top-level OCFL Objects
[storage root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
└── ab
└── 12
└── cd
└── 34
└── ab12cd34
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
└── v2
├── foo
│ └── bar.xml
├── inventory.json
└── inventory.json.sha512
OCFL implementation patterns
https://ocfl.io/draft/implementation-notes/
Rebuildability
● Key OCFL goal -- be able to rebuild repo
from an OCFL storage root
● Therefore, in OAIS terms: must include
all the descriptive, administrative,
structural, representation, and
preservation metadata relevant to the
object.
● Optionally include copy of spec in top level
of OCFL storage root
● More complete option would be a specific
OCFL object that contains this
documentation and to have a pointer to its
location in the storage root.
e.g. permissions, access, and
creation times
● not portable between filesystems
● not preservable through file
transfer operations
● ill-defined fixity
⇒ out-of-scope
If important, use filesystem image
format or extract as metadata
Filesystem metadata
Empty Directories
● OCFL preserves files and their
content
● Directories serve as an
organizational convention
● Empty directories not directly
supported
⇒ Use zero-length `.keep` file as
necessary (ala. `git`, BagIt)
Only special files are the inventory,
its digest file, and conformance
declaration files
Otherwise OCFL makes no
distinction between different types of
files.
⇒ Use local conventions as
needed
Data and Metadata
Storage
● Filesystem or Object Store -- you choose
● Original filename or Normalized filename -- you choose
● Deduplication & Forward delta differencing (at file level) --
optional but likely desirable/normal
"logical file path" - path of file in content as part of state for a particular version
"existing file path" - path of file in OCFL object
content addressing ties these two together
Storage Root Hierarchy - flat, pairtree, ex-wye-zee
[storage_root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── d45be626e024
| ├── 0=ocfl_object_1.0
| ├── inventory.json
| ├── inventory.json.sha512
| └── v1...
├── d45be626e036
| ├── 0=ocfl_object_1.0
| ├── inventory.json
| ├── inventory.json.sha512
| └── v1...
├── 3104edf0363a
| ├── 0=ocfl_object_1.0
| ├── inventory.json
| ├── inventory.json.sha512
| └── v1...
[storage_root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── d4
| └── 5b
| └── e6
| └── 26
| └── e0
| ├── 24
| | └──d45be626e024
| | ├──
0=ocfl_object_1.0
| | └── ...
| └── 36
| └──d45be626e036
| ├──
0=ocfl_object_1.0
| └── ...
File operations
(mungification?)
● Inheritance
● Addition
● Updating
● Renaming
● Deletion
● Reinstatement
● Purging ⇒ choices:
a. rebuild new object
b. break immutability and
rewrite (not recommended)
Yes - OCFL supports that...
Version Immutability
OCFL supports systems where
versions (everything in a given
version directory) is immutable once
written.
● It is recommended to follow this
practice
● BUT you can rewrite objects if
you really want to, but
OCFL supports (in fact, enforces for
internal references) deduplication
through digests
● Only within an object
● File level
● sha512 digest recommended
Deduplication
Forward Delta
Each version need only include new
and changed files
● Files from previous version
included by reference
● Reference by content (digest)
supports renaming without
duplicating
(You can avoid this and include files again if you
really want. But why?)
1. Digests used for reference
already provide basis for strong
fixity checks (pref. sha512)
2. Additional digests may be
include to support legacy fixity
information (e.g. md5)
(Fixity of inventory files themselves handled by
sidecar file, e.g. inventory.json.sha512)
Fixity
Log Information
log directory in OCFL object
available for information not in
objects content and not versioned
● form not specified
● will be ignored in object
validation
Objects with many small file may
cause problems with some storage
infrastructures and may make
validation/fixity time consuming
● package in single file (ZIP
recommend)
(Options for a later version of the OCFL spec
are ZIPped objects and/or ZIP by version)
Small Files
Roadmap
Alpha (yesterday)
● Released(ish) on October 10 community call
(OCFL Editors and PASIG Discuss)
● Feedback for November community call
Beta (date based on feedback)
● Experimental validation tool
● Determine what other groups communities to
seek input from
Release 1.0 (2019)
● One production-ready validator
● Test suite and fixture objects
● Two institutions committed to backing the
initiative (should define that)
41
Thank You
https://ocfl.io
https://github.com/OCFL
ocfl-community@googlegroups.com

More Related Content

What's hot

Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018
비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018 비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018
비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018 Amazon Web Services Korea
 
AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례
AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례
AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례Amazon Web Services Korea
 
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기Amazon Web Services Korea
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesSreenivas Makam
 
CloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
CloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingCloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
CloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingAmazon Web Services Korea
 
[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법
[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법
[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법Amazon Web Services Korea
 
Common Workloads on the AWS Cloud
Common Workloads on the AWS CloudCommon Workloads on the AWS Cloud
Common Workloads on the AWS CloudAmazon Web Services
 
KOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S MonitoringKOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S Monitoringissac lim
 
202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ
202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ
202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピAmazon Web Services Japan
 
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트Amazon Web Services Korea
 
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안Amazon Web Services Korea
 
Taking advantage of Prometheus relabeling
Taking advantage of Prometheus relabelingTaking advantage of Prometheus relabeling
Taking advantage of Prometheus relabelingJulien Pivotto
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesAtmire
 
ELK, a real case study
ELK,  a real case studyELK,  a real case study
ELK, a real case studyPaolo Tonin
 
AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정
AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정
AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정Amazon Web Services Korea
 
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017Amazon Web Services Korea
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 

What's hot (20)

Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Amazon DynamoDB 키 디자인 패턴
Amazon DynamoDB 키 디자인 패턴Amazon DynamoDB 키 디자인 패턴
Amazon DynamoDB 키 디자인 패턴
 
비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018
비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018 비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018
비즈니스 혁신 가속화와 효과적 규정 준수를 위한 AWS ISMS 소개::신종회::AWS Summit Seoul 2018
 
AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례
AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례
AWS Summit Seoul 2023 |투자를 모두에게, 토스증권의 MTS 구축 사례
 
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기Amazon Redshift로 데이터웨어하우스(DW) 구축하기
Amazon Redshift로 데이터웨어하우스(DW) 구축하기
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
CloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
CloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 GamingCloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
CloudFront로 동적 컨텐츠를 전송하는 네가지 이유 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
 
[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법
[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법
[AWS Builders] AWS 스토리지 서비스 소개 및 사용 방법
 
Common Workloads on the AWS Cloud
Common Workloads on the AWS CloudCommon Workloads on the AWS Cloud
Common Workloads on the AWS Cloud
 
KOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S MonitoringKOCOON – KAKAO Automatic K8S Monitoring
KOCOON – KAKAO Automatic K8S Monitoring
 
202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ
202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ
202112 AWS Black Belt Online Seminar 店内の「今」をお届けする小売業向けリアルタイム配信基盤のレシピ
 
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
 
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
 
Taking advantage of Prometheus relabeling
Taking advantage of Prometheus relabelingTaking advantage of Prometheus relabeling
Taking advantage of Prometheus relabeling
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable Entities
 
ELK, a real case study
ELK,  a real case studyELK,  a real case study
ELK, a real case study
 
AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정
AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정
AWS Summit Seoul 2023 | Amazon Redshift Serverless를 활용한 LG 이노텍의 데이터 분석 플랫폼 혁신 과정
 
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 

Similar to Oxford Common File Layout (OCFL)

The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
 
Hypatia for dlf 2011
Hypatia for dlf 2011Hypatia for dlf 2011
Hypatia for dlf 2011DLFCLIR
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overvieweposthumus
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineAndrew Case
 
Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureMenzo Windhouwer
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Julie Allinson
 
2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slidesDuraSpace
 
Audio MD Metadata Scheme
Audio MD Metadata SchemeAudio MD Metadata Scheme
Audio MD Metadata SchemeAriel Hess
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Eclipse Memory Analyzer
Eclipse Memory AnalyzerEclipse Memory Analyzer
Eclipse Memory Analyzernayashkova
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 

Similar to Oxford Common File Layout (OCFL) (20)

The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
Hypatia for dlf 2011
Hypatia for dlf 2011Hypatia for dlf 2011
Hypatia for dlf 2011
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
 
Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN Infrastructure
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent Archive
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides
 
Audio MD Metadata Scheme
Audio MD Metadata SchemeAudio MD Metadata Scheme
Audio MD Metadata Scheme
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
OCFL v1.0
OCFL v1.0OCFL v1.0
OCFL v1.0
 
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Eclipse Memory Analyzer
Eclipse Memory AnalyzerEclipse Memory Analyzer
Eclipse Memory Analyzer
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 

More from Simeon Warner

Questioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the DataQuestioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the DataSimeon Warner
 
OCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation PersistenceOCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation PersistenceSimeon Warner
 
Welcome to the FOLIO Community
Welcome to the FOLIO CommunityWelcome to the FOLIO Community
Welcome to the FOLIO CommunitySimeon Warner
 
Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging Simeon Warner
 
FOLIO and Linked Data
FOLIO and Linked DataFOLIO and Linked Data
FOLIO and Linked DataSimeon Warner
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateSimeon Warner
 
Don't bold the field name!
Don't bold the field name!Don't bold the field name!
Don't bold the field name!Simeon Warner
 
Samvera and IIIF 2018
Samvera and IIIF 2018Samvera and IIIF 2018
Samvera and IIIF 2018Simeon Warner
 
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...Simeon Warner
 
Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)Simeon Warner
 
Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)Simeon Warner
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsSimeon Warner
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
 
ORCID & other Person iDs
ORCID & other Person iDsORCID & other Person iDs
ORCID & other Person iDsSimeon Warner
 
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFWho's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFSimeon Warner
 
IIIF without an image server? No problem!
IIIF without an image server? No problem!IIIF without an image server? No problem!
IIIF without an image server? No problem!Simeon Warner
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateSimeon Warner
 
Discovery of IIIF Resources
Discovery of IIIF ResourcesDiscovery of IIIF Resources
Discovery of IIIF ResourcesSimeon Warner
 

More from Simeon Warner (20)

Questioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the DataQuestioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the Data
 
OCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation PersistenceOCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation Persistence
 
Welcome to the FOLIO Community
Welcome to the FOLIO CommunityWelcome to the FOLIO Community
Welcome to the FOLIO Community
 
Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging
 
FOLIO and Linked Data
FOLIO and Linked DataFOLIO and Linked Data
FOLIO and Linked Data
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status Update
 
LKG Editor Dev
LKG Editor DevLKG Editor Dev
LKG Editor Dev
 
Don't bold the field name!
Don't bold the field name!Don't bold the field name!
Don't bold the field name!
 
Samvera and IIIF 2018
Samvera and IIIF 2018Samvera and IIIF 2018
Samvera and IIIF 2018
 
ORCID @ Cornell
ORCID @ CornellORCID @ Cornell
ORCID @ Cornell
 
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
 
Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)
 
Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
ORCID & other Person iDs
ORCID & other Person iDsORCID & other Person iDs
ORCID & other Person iDs
 
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFWho's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
 
IIIF without an image server? No problem!
IIIF without an image server? No problem!IIIF without an image server? No problem!
IIIF without an image server? No problem!
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status Update
 
Discovery of IIIF Resources
Discovery of IIIF ResourcesDiscovery of IIIF Resources
Discovery of IIIF Resources
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Oxford Common File Layout (OCFL)

  • 1. Oxford Common File Layout Rosalyn Metz (Emory), Simeon Warner (Cornell) Samvera Connect 2018 http://bit.ly/ocfl-samcon2018
  • 2. Not just us... OCFL Editorial Group ● Andrew Hankinson (Oxford) ● Neil Jefferies (Oxford) ● Julian Morley (Stanford) ● Andrew Woods (DuraSpace) ● and us (Rosalyn and Simeon) Community input from pasig-discuss and ocfl-community groups, and from others
  • 4. BagIt Well established and implemented specification for handling sets of files ● Being formally standardized as RFC: https://tools.ietf.org/html/draft-kunze-bagit-17 ● Used for transfer and (somewhat less) for files at rest ● Good fixity support ● No explicit versioning support ○ Could use local conventions for version inside a bag ○ Could use bag-per-version
  • 5. Moab: A Brief History Slides adapted from Julian Morley's in the OR2018 OCFL presentation ● Moab is the closest ancestor of OCFL ● Developed at Stanford Libraries by Richard Anderson ○ Article: http://journal.code4lib.org/articles/8482 ● Named after Moab, UT
  • 6. Moab: A Brief History ● Moab is a versioned, forward delta file structure that supports fixity and file de-duplication. ● You can preserve anything with it (even cat pictures found on the internet) ● The tools to manage and create Moabs are open source Ruby gem ○ https://github.com/sul-dlss/moab-versioning
  • 7. Moab is part of the Stanford Digital Repository Here be Moabs!
  • 8. Moab in Practice @ Stanford We have many Moabs in the SDR ● 1.6 million Moab objects ● 5 million version directories ● 50+ million files ● 500+ TB of data (25TB added last month) ● Spread across 15 NFS volumes on NetApp filers ● Backed up by IBM Spectrum Protect (formerly TSM) ○ 1 tape copy kept in local tape frame;1 sent to Iron Mountain
  • 9. ab123cd4567 v0001 data content title.jpg intro.jpg page1.jpg page2.jpg page3.jpg metadata versionMetadata.xml descMetadata.xml identityMetadata.xml manifests versionInventory.xml signatureCatalog.xml versionAdditions.xml fileInventoryDifference.xml manifestInventory.xml v0002 data content page2.jpg metadata versionMetadata.xml technicalMetadata.xml manifests versionInventory.xml signatureCatalog.xml versionAdditions.xml fileInventoryDifference.xml manifestInventory.xml two version directories; /v0001 & /v0002 A sample Moab object on disk /data content comes from upstream and could be anything, but our systems create data in /content and /metadata directories. /manifests directories are for Moab metadata. This is where we store all the checksums and change information for deduplication and forward deltas.
  • 11.
  • 12. CULAR @ 2017 It worked, what now? ● Fedora 3 no longer being developed, Fedora 4 not an appropriate option ● Decision not to buy "preservation services", primarily on cost grounds ● Decision that we want one local copy for legal access reasons Short term ⇒ use local disk and AWS S3. Build tools over filesystem and object stores
  • 13.
  • 14. Those files sure are piling up! Nearly 100TB now, planning 100TB/year digitization ● Plan to purchase a scalable local (object) storage system for 1 copy ● Two more copies in cloud (perhaps tape) ● Content will outlast any application or software system ● Content will outlast any storage system ● Expect change and hence migration ⇒ KISS
  • 15.
  • 16.
  • 18. Shared Cornell and OCFL Goals ● Provide an application and vendor neutral storage arrangement that can be used with filesystems and object stores ○ Allow easy replication between multiple storage environments ○ Allow easy migration between storage systems (modulo the inherent burdens) ○ Allow use with multiple and changing applications ● Support package versioning at low cost (complexity and storage use) ● Support internal package validation for completeness and fixity ● Support audit and self-description of entire store ● Have an easy migration path from current archival storage arrangements ● Develop a shared model that is useful at multiple institutions so that all benefit from community developed tools and expertise.
  • 20. Lessons from Emory: Deliverables Actively engaged in a multi-year effort to gather requirements, design, and develop a digital repository based on the Samvera framework. Selected deliverables included... Develop object definitions/types (e.g. collections, objects, other entities) and their relationships to one another; determine preservation objects inside and outside of Fedora. Identify needs for AIP structure. Identify storage requirements (e.g. number of copies, file access scenarios)
  • 21. Lessons from Emory: Identified requirements The means to distribute digital objects to third-party preservation services. A well understood and well documented model for storing digital objects. Ability to place multiple copies of digital objects into diverse storage services (AWS, local storage, etc.). Easily allow for fixity checking of digital objects.
  • 22. Digital Object Content Files (Primary or Supplemental) Content file 1 Content file 2 Content file 3... … + additional … + additional The content itself: relationships provided in structural metadata Metadata (Actionable/Indexed) Desc. metadata Technical metadata (File-level) Preservation Events/Audits Administrative metadata Structural metadata (PCDM) Metadata converted to RDF for Hyrax/Fedora - editable and/or searchable Supplemental Preservation Files (Metadata/Administrative Files) Source Metadata (binary file) Desc. Metadata record (binary file) METS (binary file) License/agreement (binary file) Supplemental PREMIS (binary file) Variable supplemental info stored as files (not directly system-readable): staff can view or download file to read it
  • 23. Collection Ancient Egyptian Collection Administrative Collection Carlos Museum Administrative Collections reflect the process the libraries followed when deciding to collect materials. Digital Objects must be a part of an Administrative Collection and optionally in one or more Collections Digital Objects may contain one or more files Digital Objects, Collections receive Emory-defined metadata and relationships Major Emory Entities PCDM Context - Simple Example Individual Agreements contain information about the Administrative Collection. Individual Agreements may contain one or more files Individual Agreements are assigned to objects through their parent Collection Is a member of Is a member of Individual Agreement Carlos Museum Agreement Digital Object Statuette of a Cat. Collection Divine Felines Exhibition Is a member of Is a member of
  • 25. OCFL Requirements 1) Completeness, so that a repository can be rebuilt from the files it stores, 2) Parsability, both by humans and machines, most importantly in the absence of original software, 3) Robustness, against errors, corruption, and migration between storage technologies, and 4) Storage, on a variety of infrastructures including cloud object stores. Many existing digital preservation standards like: ● TDR (ISO 16363) ● OAIS (ISO 14721) ● NDSA Levels of Preservation ● BagIt discuss the need for these requirements, but none provided a standardized way for how to do it.
  • 27. OCFL Object A group of one or more content files and administrative information identified by a URI. The object may contain a sequence of versions of the files organized into version directories. The base directory of the object may contain a logs directory. A NAMASTE file indicating conformance. An object contains an inventory digest file which provides a digest for the inventory.json file. [object root] ├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.sha512 ├── v1 │ ├── empty.txt │ ├── foo │ │ └── bar.xml │ ├── image.tiff │ ├── inventory.json │ └── inventory.json.sha512 ├── v2 │ ├── foo │ │ └── bar.xml │ ├── inventory.json │ └── inventory.json.sha512 └── v3 ├── inventory.json └── inventory.json.sha512
  • 28. OCFL Object An object contains an inventory.json file which inventories the contents of an object. The manifest block lists all the digests and existing file paths for all of the object’s content. The versions block identifies the logical file path and the digest for each version of the object’s content. Separating the logical file path from the existing file path and using digests to refer to files allows for deduplication of content. { "head": "v3", "id": "ark:/12345/bcd987", "manifest": { "4d27c8...b53": [ "v2/foo/bar.xml" ], "7dcc35...c31": [ "v1/foo/bar.xml" ], "cf83e1...a3e": [ "v1/empty.txt" ], "ffccf6...62e": [ "v1/image.tiff" ] }, "type": "Object", "versions": [ { "created": "2018-01-01T01:01:01Z", "message": "Initial import", "state": { "7dcc35...c31": [ "foo/bar.xml" ], "cf83e1...a3e": [ "empty.txt" ], "ffccf6...62e": [ "image.tiff" ] }, "type": "Version", "user": { "address": "alice@example.com", "name": "Alice" }, "version": "v1" }, { "created": "2018-02-02T02:02:02Z", "message": "Fix bar.xml, remove image.tiff,
  • 29. OCFL Storage Root The base directory of an OCFL storage layout. Should also contain the OCFL specification in human-readable plain-text format. Should contain the conformance declaration OCFL Objects may conform to the same or earlier version of the specification. The storage hierarchy must terminate with an OCFL Object Root. [storage root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) ├── ab12cd34 │ ├── 0=ocfl_object_1.0 │ ├── inventory.json │ ├── inventory.json.sha512 │ └── v1 │ ├── file.txt │ ├── inventory.json │ └── inventory.json.sha512 └── ef56gh78 . ├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.sha512 ├── v1 │ ├── empty.txt │ ├── foo │ │ └── bar.xml │ ├── image.tiff │ ├── inventory.json │ └── inventory.json.sha512 └── v2 ├── foo │ └── bar.xml ├── inventory.json └── inventory.json.sha512
  • 30. OCFL Storage Root Storage hierarchies must not include files within intermediate directories Storage hierarchies must be terminated by OCFL Object Roots Storage hierarchies within the same OCFL Storage Root should use just one layout pattern Storage hierarchies within the same OCFL Storage Root should consistently use either a directory hierarchy of OCFL Objects or top-level OCFL Objects [storage root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) └── ab └── 12 └── cd └── 34 └── ab12cd34 ├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.sha512 ├── v1 │ ├── empty.txt │ ├── foo │ │ └── bar.xml │ ├── image.tiff │ ├── inventory.json │ └── inventory.json.sha512 └── v2 ├── foo │ └── bar.xml ├── inventory.json └── inventory.json.sha512
  • 32. Rebuildability ● Key OCFL goal -- be able to rebuild repo from an OCFL storage root ● Therefore, in OAIS terms: must include all the descriptive, administrative, structural, representation, and preservation metadata relevant to the object. ● Optionally include copy of spec in top level of OCFL storage root ● More complete option would be a specific OCFL object that contains this documentation and to have a pointer to its location in the storage root. e.g. permissions, access, and creation times ● not portable between filesystems ● not preservable through file transfer operations ● ill-defined fixity ⇒ out-of-scope If important, use filesystem image format or extract as metadata Filesystem metadata
  • 33. Empty Directories ● OCFL preserves files and their content ● Directories serve as an organizational convention ● Empty directories not directly supported ⇒ Use zero-length `.keep` file as necessary (ala. `git`, BagIt) Only special files are the inventory, its digest file, and conformance declaration files Otherwise OCFL makes no distinction between different types of files. ⇒ Use local conventions as needed Data and Metadata
  • 34. Storage ● Filesystem or Object Store -- you choose ● Original filename or Normalized filename -- you choose ● Deduplication & Forward delta differencing (at file level) -- optional but likely desirable/normal "logical file path" - path of file in content as part of state for a particular version "existing file path" - path of file in OCFL object content addressing ties these two together
  • 35. Storage Root Hierarchy - flat, pairtree, ex-wye-zee [storage_root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) ├── d45be626e024 | ├── 0=ocfl_object_1.0 | ├── inventory.json | ├── inventory.json.sha512 | └── v1... ├── d45be626e036 | ├── 0=ocfl_object_1.0 | ├── inventory.json | ├── inventory.json.sha512 | └── v1... ├── 3104edf0363a | ├── 0=ocfl_object_1.0 | ├── inventory.json | ├── inventory.json.sha512 | └── v1... [storage_root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) ├── d4 | └── 5b | └── e6 | └── 26 | └── e0 | ├── 24 | | └──d45be626e024 | | ├── 0=ocfl_object_1.0 | | └── ... | └── 36 | └──d45be626e036 | ├── 0=ocfl_object_1.0 | └── ...
  • 36. File operations (mungification?) ● Inheritance ● Addition ● Updating ● Renaming ● Deletion ● Reinstatement ● Purging ⇒ choices: a. rebuild new object b. break immutability and rewrite (not recommended) Yes - OCFL supports that...
  • 37. Version Immutability OCFL supports systems where versions (everything in a given version directory) is immutable once written. ● It is recommended to follow this practice ● BUT you can rewrite objects if you really want to, but OCFL supports (in fact, enforces for internal references) deduplication through digests ● Only within an object ● File level ● sha512 digest recommended Deduplication
  • 38. Forward Delta Each version need only include new and changed files ● Files from previous version included by reference ● Reference by content (digest) supports renaming without duplicating (You can avoid this and include files again if you really want. But why?) 1. Digests used for reference already provide basis for strong fixity checks (pref. sha512) 2. Additional digests may be include to support legacy fixity information (e.g. md5) (Fixity of inventory files themselves handled by sidecar file, e.g. inventory.json.sha512) Fixity
  • 39. Log Information log directory in OCFL object available for information not in objects content and not versioned ● form not specified ● will be ignored in object validation Objects with many small file may cause problems with some storage infrastructures and may make validation/fixity time consuming ● package in single file (ZIP recommend) (Options for a later version of the OCFL spec are ZIPped objects and/or ZIP by version) Small Files
  • 40. Roadmap Alpha (yesterday) ● Released(ish) on October 10 community call (OCFL Editors and PASIG Discuss) ● Feedback for November community call Beta (date based on feedback) ● Experimental validation tool ● Determine what other groups communities to seek input from Release 1.0 (2019) ● One production-ready validator ● Test suite and fixture objects ● Two institutions committed to backing the initiative (should define that)