The explosion of data creation across all scholarly disciplines necessitates corresponding efforts to create new solutions for its management and use. Ever-growing repositories and datasets within require organization, identification, description, publication, discovery, citation, preservation, and curation to allow these materials to realize their potential in support of data-driven, often interdisciplinary research. What infrastructures and technical environments are required for this work? Can new approaches, specifications, standards and best practices be created? Are there partnerships and collaborations that exist or can be pursued? This webinar, Part 2 of a two-part NISO series on data, will explore these and other questions
2. Dataset
Iden*fica*on
&
Cita*on:
DataCite
and
EZID
Joan
Starr
California
Digital
Library
October,
2011
3. Dataset
Iden*fica*on
&
Cita*on
Introduc*on
The
Researchers’
Challenge
Iden*fiers
are
a
tool
for
researchers
DataCite
“Helping
you
find,
access
and
reuse
data.”
EZID
Easy
crea*on
and
management
of
DataCite
DOIs
and
other
iden*fiers.
Next
steps
For
DataCite,
EZID
and
you!
7. Early
in
the
research
life
cycle
Data-‐intensive
research
+
Wri*ng
up
the
results
Where’s
the
data?
What
if
I
move
it?
PERSISTENT
IDENTIFIERS
make
the
difference
by
Dave
Rogers
hWp://www.flickr.com/photos/dave-‐rogers/2815036285/
10. Mee*ng
funder
requirements
• Data-‐intensive
research
+
• Grantor
requirements
for
data
management
What
do
we
plan
put
here?
How
do
we
track
the
data?
PERSISTENT
IDENTIFIERS
make
the
difference
By
David
Mellis,
hWp://www.flickr.com/photos/mellis/7675610/
11.
12. DataCite
German
Na8onal
Library
of
Economics
(ZBW)
Canada
Ins8tute
for
Scien8fic
and
Technical
Informa8on
German
Na8onal
Library
of
Science
and
Technology
(TIB)
(CISTI)
German
Na8onal
Library
of
Medicine
(ZB
MED)
Technical
Informa8on
Center
of
Denmark
GESIS
-‐
Leibniz
Ins8tute
for
the
Social
Sciences,
Germany
Ins8tute
for
Scien8fic
&
Technical
Informa8on
(INIST-‐
Australian
Na8onal
Data
Service
(ANDS)
CNRS),
France
ETH
Zurich,
Switzerland
TU
DelS
Library,
The
Netherlands
The
Swedish
Na8onal
Data
Service
(SNDS)
The
Bri8sh
Library
,
UK
California
Digital
Library
(CDL),
USA
Office
of
Scien8fic
&
Technical
Informa8on
(OSTI),
USA
Purdue
University
Library
13. DataCite
Metadata
V.
2.2
• Small
required
set
=
cita*on
elements
• Op*onal
descrip*ve
set:
– extendable
lists
– can
refer
to
other
standards,
schemes
– domain-‐neutral
– rich
ability
to
describe
rela*onships
to
other
digital
objects
• Metadata
Search
(MDS)
is
full-‐text
indexed
14. DataCite
Metadata
V.
2.2
Required
proper8es
Op8onal
proper8es
1. Iden8fier
(with
type
aWribute)
6. Subject
(with
schema
aWribute)
2. Creator
(with
name
iden*fier
7. Contributor
(with
type
&
name
iden*fier
aWributes)
aWributes)
3. Title
(with
op*onal
type
aWribute)
8. Date
(with
type
aWribute)
4. Publisher
9. Language
5. Publica8onYear
10. ResourceType
(with
descrip*on
aWribute)
11. AlternateIden*fier
(with
type
aWribute)
12. RelatedIden*fier
(with
type
&rela*on
type
aWributes)
13. Size
14. Format
15. Version
16. Rights
17. Descrip*on
(with
type
aWribute)
30. Outline
• Who:
Texas
Digital
Library
• Where:
on
the
cloud
• Why:
mo*va*ons
• When:
late
2010
• What:
lessons
learned
June
2011
30
31. Who:
Texas
Digital
Library
• Consor*um
of
higher
educa*on
ins*tu*ons
in
Texas
• Current
services
include:
– Ins*tu*on:
IR
(DSpace),
ETD
system
– Faculty:
OJS,
OCS,
blogs,
wikis
– Approximately
70
customer-‐facing
service
instances
• Legacy
hardware
included
– Compute
servers
– Storage
servers
– Network
support
devices
June
2011
31
32. Where:
on
the
cloud
• Migrated
customer-‐facing
services
to
AWS
– 50
AWS
VM
instances
• Maintained
some
services
on
local
hardware
• Simplified
and
consolidated
system
architecture
June
2011
32
33. Why:
mo*va*ons
/
When:
late
2010
• Disaster
recovery
plan
– Prepare
for
data
center
move
• Elas*c
capacity
– New
members,
collec*ons
• Personnel
savings
– Fewer
competencies,
responsibili*es
• Began
Oct
2010
June
2011
33
34. What:
lessons
learned
• The
Good
– Elas*c
capacity;
customers
did
not
no*ce
change
– No
hardware
purchase
cycle
• The
Mixed
– Lower
personnel
costs;
failover
• The
Unexpected
– Development
tools;
concerns
about
AWS
being
in
U.S.;
excellent
management
console
June
2011
34
35. Future
• Preserva*on
– DuraCloud
• Con*nue
to
evaluate
– AWS
is
flexible
and
feature
rich,
but
may
s*ll
not
be
cost
effec*ve
June
2011
35
36. For
more
informa*on
about
the
TDL,
please
visit
the
Texas
Digital
Library
website
at
hWp://www.tdl.org
or
contact
us
at
info@tdl.org.
38. Why
Data
Sharing
is
Good
• research
reproducibility
• fiscal
responsibility
• broadest
possible
impact
• large-‐scale
data
interoperability
– Includes
technical,
social,
legal
and
policy
aspects
– usual
focus
on
technical/social
– focus
here
on
legal/policy
aspects
39. Why
Data
Sharing
is
Hard
• No
incen*ves
to
improve
data
quality,
provide
missing
documenta*on
• Confiden*ality
and
privacy
concerns
(e.g.
HIPAA,
endangered
species)
• Patents
and
commercial
poten*al
• Closed
Access
to
journal
ar*cles
(i.e.
results)
• IP
issues
very
complicated
40. Defini*ons
Data
governance
is
the
system
of
decision
rights
and
accountabili8es
that
describe
who
can
take
what
ac8ons
with
what
data,
and
when,
under
what
circumstances,
using
what
methods
• strategies
for
data
quality
control
and
management,
and
processes
that
insure
important
data
assets
are
formally
managed
throughout
an
organiza*on;
– organiza*ons
can
be
legal
en**es
like
universi*es,
or
virtual
organiza=ons
(e.g.
distributed
research
collabora*ons)
– Includes
business
processes
and
risk
management;
• laws
and
policies
associated
with
data;
• ensures
that
data
can
be
trusted
and
that
people
are
accountable
for
ac*ons
affec*ng
the
data
41. Defini*ons
• A"ribu'on
is
legally-‐imposed,
remedy
is
lawsuit
• Credit
is
what
researchers
want
• Cita'on
is
the
norm
in
scholarly
communica*on,
to
provide
suppor*ng
evidence,
now
proxy
for
credit
AWribu*on
does
not
insure
credit
or
cita*on.
42. Legal
Mechanisms
for
Sharing
Data
1.
licenses
Require
aWribu*on
2.
contracts
3.
waivers
No
aWribu*on
requirement
43. Copyright
for
Data
• Does
not
apply
to
facts,
e.g.,
most
scien*fic
data
• Can
apply
to
a
collec=on
of
facts,
but
only
to
original
aspects,
not
facts
themselves
• Can
extract
facts
from
a
copyrighted
database
without
infringing
44. Licenses
• Licenses
are
not
contracts
– depend
on
underlying
rights,
e.g.
copyright
or
sui
generis
rights
– Copyright
is
a
bundle
of
rights,
automa*c
when
fixed,
limited
in
scope
and
dura*on
• US
and
EU
differ
(EU
has
sui
generis
data
rights)
so
different
licenses
cover
copyright,
sui
generis
rights,
or
both
45. Licenses
• Crea*ve
Commons
(CC-‐BY)
example
– applies
to
data
and
databases
to
the
extent
they’re
copyrightable
– Only
data
uses
that
implicate
copyright
trigger
aWribu*on
requirement
– uses
of
data
that
do
not
implicate
copyright,
e.g.
is
in
the
public
domain,
do
not
trigger
aWribu*on
46. Licenses
• Hard
to
assess
copyright
for
par*cular
data
and
databases
• Hard
to
know
when
license
applies,
creates
risks:
– data
provider
be
misled
– data
user
will
under
or
over
comply
47. Licenses
• AWribu*on
requirements
are
inflexible,
causing
absurd
situa*ons
– e.g.
providing
aWribu*on
to
1,000
providers
in
1,000
different
ways
– known
as
‘aWribu*on
stacking’
• Could
provide
aWribu*on
and
s*ll
not
sa*sfy
norms
or
expecta*ons
49. Contracts
• Do
not
require
underlying
right
– rely
on
offer/acceptance,
click
through,
terms
of
use
– require
formali*es,
e.g.
aWribu*on
• Downsides
– confusing
obliga*ons,
no
standardiza*on,
each
user
agreement
can
have
different
requirements
• Researchers
may
avoid
data
if
they
can’t
understand
the
terms
of
use
50. Contracts
Unlike
licenses,
contracts
only
binds
par=es
• If
someone
obtains
licensed
data
and
shares
it,
anyone
who
obtains
data
from
that
user
is
s*ll
bound
by
the
license
• If
data
had
been
shared
by
contract,
anyone
obtaining
data
from
the
second
party
is
not
bound
by
the
contract
since
they
aren’t
a
party
to
the
contract
• In
this
respect,
contracts
are
more
limited
than
licenses
51. Contracts
• Have
broader
reach
than
licenses
– not
*ed
to
a
legal
right
– can
take
away
rights
of
public
53. Waivers
• Provide
legal
certainty
– No
need
to
decipher
copyright
protec*on
or
six
through
confusing
legalese
– BeWer
than
silence,
to
avoid
forcing
people
to
guess
what
their
risks
are
• Mean
loss
of
control
– Can’t
require
aWribu*on
or
other
terms
• Avoid
problems
and
rely
on
scholarly
norms
– no
aWribu*on
stacking
or
inappropriate
obliga*ons
55. Summary
• Law
is
messy,
each
approach
has
consequences
• Licenses
–
(1)
legal
uncertainty
about
scope,
(2)
requirements
can
be
inconsistent
with
norms
• Contracts
–
(1)
burdensome
requirements
with
custom
terms,
(2)
exceed
scope
of
rights
with
requirements
that
take
away
normal
rights
• Waivers
–
(1)
avoid
problems,
but
(2)
lose
control
and
rely
on
norms
56. Summary
• Each
approach
requires
loss
of
control
• No
mechanism
imposes
legally-‐binding
obliga*ons
in
way
that
perfectly
maps
to
scholarly
credit,
e.g.
cita*on
• Ideal
solu*on
creates
the
least
fric*on
to
scien*fic
progress
while
giving
credit
where
due,
i.e.,
waivers
and
norms
(the
community
governs
itself)