The Ultimate Guide to Choosing WordPress Pros and Cons
What Your Tweets Tell Us About You, Speaker Notes
1. • Introduce
paper
title
• Ask
people
to
interact,
comment,
respond
to
our
questions
during
presentation
using
#tweetprivacy
• Credits
o Charlesworth
–
whose
Digital
Lives
Report
was
one
of
the
only
papers
that
provided
any
analysis
and
guidance
in
the
area
of
social
media
archiving.
Interest
in
social
media
data
is
multidisciplinary,
resulting
in
conflicting
views
regarding
the
ethical
management
of
captured
datasets.
Curators
will
be
required
to
navigate
these
conflicting
views
as
they
work
to
provide
appropriate
mechanisms
for
access
and
reuse
of
these
data.
We
hope
to
encourage
researchers,
library,
archive,
or
repository
staff
to
engage
in
a
cross-‐disciplinary
conversation
about
the
privacy
issues
(as
well
as
the
host
of
other
issues)
inherent
in
using
social
media
as
a
primary
source
for
research.
We’re
going
to
show
you
a
clip
from
Laila
Sakr’s
presentation
at
the
Tech@state
Data
Visualization
Conference
in
Washington
DC.
The
clip
provides
a
good
example
of
how
researchers
are
using
twitter
and
other
social
media
data.
[Play
Clip]
There
are
two
key
things
I
want
to
point
out:
1.
Long-‐term
archiving
of
this
data
and
other
curatorial
issues
like
value,
authenticity,
and
significant
properties
are
absent
from
this
talk,
which
is
not
surprising.
They
were
also
absent
in
many
of
the
papers
we
read
that
utilized
Twitter
data.
This
demonstrates
that
there
is
an
overall
emphasis
by
researches
at
this
point,
on
collection
and
analysis
rather
than
on
preservation.
2.
Sakr
makes
sure
to
say
that
she
is
downloading
only
the
publicly
available
tweets
using
the
search
API
and
how
this
could
potentially
affect
her
sample
and
the
validity
of
it.
She’s
not
talking
about
it
in
terms
of
privacy
issues
–
which
further
illustrates
that
the
focus
is
on
analysis
rather
privacy
or
the
ethics.
We’d
like
to
take
an
informal
poll
similar
to
last
night’s
poll
of
the
audience’s
willingness
to
have
their
genome
sequenced.
Who
among
those
of
you
who
use
Twitter
as
a
communication
tool
is
completely
fine
with
having
your
tweets,
profile
information,
images,
location
data
downloaded,
analyzed,
archived,
preserved?
-‐of
those
of
you
with
your
hands
raised,
how
many
of
you
have
tweeted
something
of
a
more
personal
nature
that
you
might
not
want
archived?
And
who
here
is
actively
involved
with
the
collection
of
Twitter
data?
–
any
social
media
data?
?What
do
you
do
with
it
–
Tweet
here]
The
reason
I
ask
is
we
found
through
our
work
with
the
Hypercities
Egypt
Twitter
data,
that
the
issue
of
whether
or
not
there
are
privacy
concerns
with
a
data
source
like
Twitter
is
essentially
a
research
ethics
issue;
which
varies
depending
on
the
role
and/or
subject
background
of
the
researcher
and
how
they
view
the
context
of
the
data
creation.
(refer
Confounding
Relationships
to
point
out
various
roles)
2. So,
our
central
thesis
is
that
perceptions
of
privacy
in
social
media
platforms
are
formed
by
disciplinary
culture,
the
capabilities
and
constraints
of
the
platform,
and
community
norms
the
platform
itself.
Does
analyzing
a
person's
Tweets
constitute
researching
a
human
subject?
Or
are
Tweets
a
creative
text
which
requires
proper
citation
and
credit
to
the
authors
or
tweeters?
Or
are
Tweets
part
of
the
open
public
record.
Social
scientists
tend
to
view
the
data
as
Human
Subject
research,
while
Humanists
tend
to
view
the
data
as
a
form
of
publication.
These
very
different
ways
of
viewing
the
data
require
different
methods
for
dealing
with
privacy.
We
feel
it
is
important
to
state
that
social
media
data
are
not
homogenous;
each
platform
has
its
own
unique
constraints
for
the
creation/inclusion
of
content
as
well
as
constraints
on
how
users
may
engage
in
the
space,
and
their
expectations
and
norms
of
interaction.
Our
case
study
focuses
on
Twitter,
so
while
we
provide
a
general
framework
assessing
privacy
issues
with
social
media,
it
must
be
understood,
that
because
of
the
uniqueness
of
Twitter’s
Privacy
Policies,
Terms
of
Service,
Developers
Rules
of
the
Road,
the
analysis
and
interpretation
are
not
necessarily
generalizable
to
other
platforms,
such
as
Facebook.
Like
many
data
curation
activities
there
will
be
some
facets
which
can
be
generalized,
while
others
may
be
platform,
or
subject
specific.
Part
of
determining
the
curation
needs
of
social
media
data
will
be
to
determine
these
boundaries.
What
can
we
learn
about
you
from
Twitter?
[Show
different
visualizations,
then
tweet
map,
tweet
image]
Depending
on
how
the
data
are
visualized
we
can
learn
about
you
as
an
individual,
your
internet
relations,
or
as
part
of
huge
collective,
or
nothing
about
you
as
an
individual
(r-‐shief
image).
Different
visualizations
will
enable
better
anonymization
than
others.
However,
the
underlying
dataset
used
to
generate
the
visualizations
will
still
contain:
if
your
account
is
unprotected,
name,
location,
photos,
etc.
anything
you
decide
to
share
in
your
timeline
–
so
if
you
include
other
personal
info
–
like
an
email
or
some
such
thing,
we
can
find
it
out
about
you.
But
What
else
can
we
find
out
about
you?
[show
the
Alyaa
Gad
slide
–
then
the
Google
Search]
Thanks
to
the
power
of
search
engines
like
google,
we
can
get
a
lot
more
information,
which
may
be
collected
and
archived
as
well.
Our
Case
Study
or
what
I
like
to
call
“we’ve
got
tweets,
now
what?”
Todd
Presner,
a
UCLA
Faculty
member
and
two
researchers
collected
a
subset
of
the
overall
Twitter
data
available.
He
asked
the
library
to
archive
it.
Before
we
could
do
anything
with
it,
we
had
to
assess
what
he
had
collected.
The
HyperCities
team
used
the
Twitter
Search
API
to
pull
data
based
on
the
location
parameter
(within
200
km
of
the
center
of
Cairo),
time
period
(January
30,
2011
through
February
24,
2011),
AND
one
of
three
hashtags
(#jan25
OR
#egypt
OR
#tahrir).
They
downloaded
approximately
420,000
public
Tweets
during
the
initial
phase
of
this
analysis
and
continue
to
feed
their
site
with
live
feeds.
3. Like
Sakr,
the
data
capture
was
motivated
by
the
fact
that
significant
events
were
taking
place
using
Twitter,
and
because
twitter
data
disappears
quickly
(10
days),
they
decided
to
start
downloading
and
make
it
available
to
as
many
people
as
possible
for
future
reference
and
study.
There
wasn’t
necessarily
any
research
question
or
overarching
thesis
behind
the
collection
other
than
to
provide
a
glimpse
back
to
the
Egyptian
Revolution
Twitterverse.
As
Dr.
Charlesworth
pointed
out
yesterday
morning,
legal
issues
with
gathering
this
type
of
data
won’t
be
at
the
forefront
of
the
researcher’s
mind.
Based
on
the
search
parameters,
the
data
set
captured
eight
out
of
approximately
forty
possible
Twitter
data
fields,
revealing
how
the
method
of
capture,
and
search
parameters
profoundly
shape
the
resultant
data.
The
data
is
sitting
on
Prof.
Presner’s
personal
server
as
JSON
files,
but
the
data
will
soon
be
converted
into
XML
for
ease
in
depositing
and
managing
the
data
in
Isalandora.
These
facts
must
be
documented
in
order
for
future
users
to
have
a
clear
understanding
of
the
data
set.
“But
the
data
are
already
public…”
So
if
the
general
understanding
that
your
twitter
data
is
open
and
public,
and
that
people
using
these
platforms
want
to
be
seen
AND
heard,
why
should
we
be
concerned
about
privacy?
The
Privacy
Policy
of
twitter
stipulates
that
while
you
“own”
your
content
–
anyone,
including
twitter
or
any
third
party,
are
given
the
right
to
access
your
data
and
re-‐use
it.
(our
reading
of
the
privacy
policy)
Those who see Twitter data as data that contains potentially identifying information about human subjects may
want to anonymize the data for the authors' protection, and may see displaying user names as unethical. This
runs contrary to Twitters Rules of the Road which require the display of a user id to give credit to the person
who tweeted.
Yet Twitter also acknowledges this public/private tension in their own policies by suggesting if there is a
concern over privacy or security risks by making a user id or other information available, the individual or
media should get in touch with them.
The
debate
about
the
capture,
reuse,
and
display
of
Twitter
data
is
the
line
between
thelegality
of
collecting
this
content
and
the
ethics
of
doing
so.
To
date
there
haven’t
been
any
formal
legal
challenges
about
the
downloading,
use
and
archiving
of
Twitter
data,
that
we
are
aware
of.
Thus
ensues
a
wide-‐ranging
debate
by
scholars
who
characterize
privacy
issues
with
social
media
data
in
the
following
ways:
Most
researchers
take
a
harm-‐based
view
of
privacy,
in
which
the
goal
is
to
protect
users’
information
from
negative
actors.
This
includes
concern
for
security
issues
(used
by
government
agencies
to
track
and
arrest;
use
as
evidence).
Recognizing
there
are
loopholes
in
the
data,
which
enables
someone
to
get
a
lot
of
information
about
an
individual,
even
if
all
you
have
is
a
username;
deletion
of
account
and
changing
from
public
to
private
content
captured
will
be
available.
4.
Finally,
(Buyer
beware)
those
users
who
have
opted
to
make
their
accounts
public
have
no
grounds
for
complaint
about
the
collection
and
reuse
of
their
content,
even
if
they
did
not
anticipate
reuse
by
researchers
or
commercial
firms
(Thelwall,
2010;
Vieweg,
2010).
Danah
boyd
still
asks:
Just
because
we
can
collect
it,
should
we?
Michael
Zimmer,
an
Internet
Privacy
scholar,
argues
instead
for
a
dignity-‐based
view
of
privacy
that
views
the
act
of
another
person
taking
one’s
personal
information
from
the
social
networking
sphere,
amassing
into
a
database,
making
available
for
use
and
scrutiny,
is
an
affront
to
the
users’/subjects’
human
dignity
and
their
ability
to
control
the
flow
of
their
personal
information.
Finally,
What
are
the
user’s
expectations
of
how
their
tweets
will
be
used?
How
many
here
have
actually
read
Twitter’s
privacy
policy?
FB?
Do
you
understand
the
implications
of
re-‐use?
___Schmidt,
Trepte,
and
Reinecke
(2011)
observe
that
users
develop
shared
routines
and
expectations
of
self-‐disclosure,
noting
that
privacy
management
is
performed
for
a
specific
audience.
Facebook
for
example
enables
users
to
select
privacy
settings
on
a
post-‐by-‐post
basis,
choosing
who
is
able
to
read,
comment,
and
interact
with
specific
content,
and
allowing
the
user
fairly
granular
control
over
the
flow
of
their
information.
Twitter
allows
only
binary
control;
users
can
designate
their
account
as
“protected”
(i.e.
Tweets
are
only
visible
to
approved
followers),
or
“public”
(enabled
by
default),
which
makes
a
user’s
profile
and
timeline
accessible
to
anyone,
even
those
without
a
Twitter
account.
The
ethical
jury
is
going
to
be
out
on
this
for
a
while;
at
least
until
scholarly
communities
work
out
parameters
and
provide
guidance
for
acceptable
use
of
social
media
data.
In
the
meantime,
what
are
we
to
do?
Legal
and
ethical
policy
related
to
privacy
and
social
media
data
is
still
in
flux
and
almost
always
lags
behind
the
pace
of
research.
Yet
libraries,
etc
are
pressured
to
act
now
to
archive
the
data.
Data
repositories
will
be
caught
in
the
middle
of
these
divergent
viewpoints
when
trying
to
determine
the
best
methods
of
providing
access
to
the
data.
The
norms
of
individual
research
disciplines
often
provide
guidance
for
curators,
but
when
researchers
with
divergent
norms
seek
access
to
the
same
data,
it
can
be
difficult
to
determine
how
best
to
serve
the
broadest
number
of
users.
Experience
with
this
data
set
and
the
literature
review
led
us
to
the
following
recommendations
Libraries
or
other
data
repositories
will
need
to
decide
if
archiving
social
media
data
fits
with
their
overall
institutional
mission
and
goals.
Libraries
should
determine
the
overall
risks
associated
with
collecting
and
archiving
social
media
data
and
design
strategies
to
mitigate
those
risks.
Because
of
the
significance
scholars
are
placing
on
the
need
to
collect
and
now
in
our
case
archive
twitter
data,
we
are
convinced
that
providing
for
the
collection,
preservation,
and
reuse
of
social
media
data
requires
at
the
very
least
conversations
among
researchers,
libraries,
archives,
institutional
review
5. boards,
scholarly
societies,
and
other
national
and
international
organizations
concerned
with
the
production
and
preservation
of
scholarship.
Part
of
the
discussion
will
need
to
include
the
context
or
conditions
under
which
the
data
have
been
collected.
One
important
aspect
of
this,
as
Dr.
Charlesworth
mentioned
in
his
talk
yesterday,
is
a
way
to
gather
“legal
metadata”
so
that
going
forward
the
archive
or
repository
will
have
the
necessary
privacy
I’s
dotted
and
t’s
crossed,
in
so
much
as
it
is
possible.
Libraries
should
engage
researchers
as
early
as
possible
in
the
research
process.
curators
are
presented
with
a
golden
opportunity
to
collaborate
with
researchers
as
close
to
the
beginning
of
the
research
lifecycle.
Through
a
collaborative
process,
we
can
ideally
facilitate
the
creation
of
collections
that
balance
openness
with
privacy
concerns,
and
encourage
broad
reuse.
While
that
early
intervention
may
not
happen,
we
can
employ
curatoratorial
strategies
on
the
backend
of
the
data
gathering
will
hopefully
push
the
issue.
(one
of
which
will
be
discussed
in
our
next
recommendation.)
Here
we
start
to
addresses
the
question
somewhat
that
was
asked
yesterday
at
Dr.
Charlesworth’s
presentation
about
educating
the
researchers.
Libraries
choosing
to
archive
social
media
data
should
develop
clear
and
easy
to
use
collection
and
deposit
policies,
forms
and
tools.
It
has
been
our
argument
since
first
working
with
Twitter
Data
that
a
way
to
both
educate
researchers
and
create
ingestible,
reusable
data
into
a
repository
is
to
create
a
workflow
that
asks
the
necessary
questions
of
researchers,
which
would
aid
in
the
creation
of
a
codebook
and
documentation
for
the
data.
We
created
a
twitter
deposit
form
that
is
geared
toward
raising
the
privacy
issues
with
this
platform,
educating
the
researcher,
as
well
as
providing
a
way
to
record
the
basic
legal
and
descriptive
metadata
necessary
for
contextualizing
the
data
for
re-‐use.
Teachable
moments
for
information
literacy
librarians.
Understand
and
know
the
source
of
information.
Twitter
adds
language
to
their
privacy
policy
that
more
explicitly
state
use.
(gain
consent
–
and
then
the
data
truly
become
open)
Ideally,
Twitter
would
take
a
different
approach
to
releasing
the
data
for
research;
partner
directly
with
researchers;
rather
than
with
a
third
party
like
GNIP
which
charges
for
the
data
and
isn’t
clear
what
can
be
done
with
it
once
it’s
been
purchased.
Lastly,
thanks
to
all
who
have
been
tweeting
during
the
session;
we
wanted
to
let
you
know
that
we’ve
archived
them
in
TwapperKeeper.