What Your Tweets Tell Us About You, Speaker Notes

• Introduce
paper
title

• Ask
people
to
interact,
comment,
respond
to
our
questions
during
presentation
using

#tweetprivacy

• Credits

o Charlesworth
–
whose
Digital
Lives
Report
was
one
of
the
only
papers
that
provided
any

analysis
and
guidance
in
the
area
of
social
media
archiving.

Interest
in
social
media
data
is
multidisciplinary,
resulting
in
conflicting
views
regarding
the
ethical

management
of
captured
datasets.
Curators
will
be
required
to
navigate
these
conflicting
views
as
they

work
to
provide
appropriate
mechanisms
for
access
and
reuse
of
these
data.

We
hope
to
encourage
researchers,
library,
archive,
or
repository
staff
to
engage
in
a
cross-‐disciplinary

conversation
about
the
privacy
issues
(as
well
as
the
host
of
other
issues)
inherent
in
using
social
media

as
a
primary
source
for
research.

We’re
going
to
show
you
a
clip
from
Laila
Sakr’s
presentation
at
the
Tech@state
Data
Visualization

Conference
in
Washington
DC.
The
clip
provides
a
good
example
of
how
researchers
are
using
twitter
and

other
social
media
data.

[Play
Clip]

There
are
two
key
things
I
want
to
point
out:

1.
Long-‐term
archiving
of
this
data
and
other
curatorial
issues
like
value,
authenticity,
and
significant

properties
are
absent
from
this
talk,
which
is
not
surprising.
They
were
also
absent
in
many
of
the
papers

we
read
that
utilized
Twitter
data.
This
demonstrates
that
there
is
an
overall
emphasis
by
researches
at

this
point,
on
collection
and
analysis
rather
than
on
preservation.

2.
Sakr
makes
sure
to
say
that
she
is
downloading
only
the
publicly
available
tweets
using
the
search
API

and
how
this
could
potentially
affect
her
sample
and
the
validity
of
it.

She’s
not
talking
about
it
in
terms

of
privacy
issues
–
which
further
illustrates
that
the
focus
is
on
analysis
rather
privacy
or
the
ethics.

We’d
like
to
take
an
informal
poll
similar
to
last
night’s
poll
of
the
audience’s
willingness
to
have
their

genome
sequenced.

Who
among
those
of
you
who
use
Twitter
as
a
communication
tool
is
completely
fine

with
having
your
tweets,
profile
information,
images,
location
data
downloaded,
analyzed,
archived,

preserved?

-‐of
those
of
you
with
your
hands
raised,
how
many
of
you
have
tweeted
something
of
a
more
personal

nature
that
you
might
not
want
archived?

And
who
here
is
actively
involved
with
the
collection
of
Twitter
data?
–
any
social
media
data?

?What
do
you
do
with
it
–
Tweet
here]

The
reason
I
ask
is
we
found
through
our
work
with
the
Hypercities
Egypt
Twitter
data,
that
the
issue
of

whether
or
not
there
are
privacy
concerns
with
a
data
source
like
Twitter
is
essentially
a
research
ethics

issue;
which
varies
depending
on
the
role
and/or
subject
background
of
the
researcher
and
how
they

view
the
context
of
the
data
creation.

(refer
Confounding
Relationships
to
point
out
various
roles)

So,
our
central
thesis
is
that
perceptions
of
privacy
in
social
media
platforms
are
formed
by
disciplinary

culture,
the
capabilities
and
constraints
of
the
platform,
and
community
norms
the
platform
itself.

Does
analyzing
a
person's
Tweets
constitute
researching
a
human
subject?
Or
are
Tweets
a
creative
text

which
requires
proper
citation
and
credit
to
the
authors
or
tweeters?
Or
are
Tweets
part
of
the
open

public
record.
Social
scientists
tend
to
view
the
data
as
Human
Subject
research,
while
Humanists
tend
to

view
the
data
as
a
form
of
publication.

These
very
different
ways
of
viewing
the
data
require
different

methods
for
dealing
with
privacy.

We
feel
it
is
important
to
state
that
social
media
data
are
not
homogenous;
each
platform
has
its
own

unique
constraints
for
the
creation/inclusion
of
content
as
well
as
constraints
on
how
users
may
engage

in
the
space,
and
their
expectations
and
norms
of
interaction.

Our
case
study
focuses
on
Twitter,
so
while
we
provide
a
general
framework
assessing
privacy
issues

with
social
media,
it
must
be
understood,
that
because
of
the
uniqueness
of
Twitter’s
Privacy
Policies,

Terms
of
Service,
Developers
Rules
of
the
Road,
the
analysis
and
interpretation
are
not
necessarily

generalizable
to
other
platforms,
such
as
Facebook.

Like
many
data
curation
activities
there
will
be
some

facets
which
can
be
generalized,
while
others
may
be
platform,
or
subject
specific.

Part
of
determining

the
curation
needs
of
social
media
data
will
be
to
determine
these
boundaries.

What
can
we
learn
about
you
from
Twitter?

[Show
different
visualizations,
then
tweet
map,
tweet
image]

Depending
on
how
the
data
are
visualized
we
can
learn
about
you
as
an
individual,
your
internet

relations,
or
as
part
of
huge
collective,
or
nothing
about
you
as
an
individual
(r-‐shief
image).

Different

visualizations
will
enable
better
anonymization
than
others.

However,
the
underlying
dataset
used
to
generate
the
visualizations
will
still
contain:

if
your
account
is
unprotected,
name,
location,
photos,
etc.
anything
you
decide
to
share
in
your
timeline
–

so
if
you
include
other
personal
info
–
like
an
email
or
some
such
thing,
we
can
find
it
out
about
you.

But
What
else
can
we
find
out
about
you?
[show
the
Alyaa
Gad
slide
–
then
the
Google
Search]

Thanks
to

the
power
of
search
engines
like
google,
we
can
get
a
lot
more
information,
which
may
be
collected
and

archived
as
well.

Our
Case
Study
or
what
I
like
to
call
“we’ve
got
tweets,
now
what?”

Todd
Presner,
a
UCLA
Faculty
member
and
two
researchers
collected
a
subset
of
the
overall
Twitter
data

available.
He
asked
the
library
to
archive
it.
Before
we
could
do
anything
with
it,
we
had
to
assess
what
he

had
collected.

The
HyperCities
team
used
the
Twitter
Search
API
to
pull
data
based
on
the
location
parameter
(within

200
km
of
the
center
of
Cairo),
time
period
(January
30,
2011
through
February
24,
2011),
AND
one
of

three
hashtags
(#jan25
OR
#egypt
OR
#tahrir).

They
downloaded
approximately
420,000
public
Tweets
during
the
initial
phase
of
this
analysis
and

continue
to
feed
their
site
with
live
feeds.

Like
Sakr,
the
data
capture
was
motivated
by
the
fact
that
significant
events
were
taking
place
using

Twitter,
and
because
twitter
data
disappears
quickly
(10
days),
they
decided
to
start
downloading
and

make
it
available
to
as
many
people
as
possible
for
future
reference
and
study.

There
wasn’t
necessarily
any
research
question
or
overarching
thesis
behind
the
collection
other
than
to

provide
a
glimpse
back
to
the
Egyptian
Revolution
Twitterverse.

As
Dr.
Charlesworth
pointed
out

yesterday
morning,
legal
issues
with
gathering
this
type
of
data
won’t
be
at
the
forefront
of
the

researcher’s
mind.

Based
on
the
search
parameters,
the
data
set
captured
eight
out
of
approximately
forty
possible
Twitter

data
fields,
revealing
how
the
method
of
capture,
and
search
parameters
profoundly
shape
the
resultant

data.

The
data
is
sitting
on
Prof.
Presner’s
personal
server
as
JSON
files,
but
the
data
will
soon
be
converted

into
XML
for
ease
in
depositing
and
managing
the
data
in
Isalandora.

These
facts
must
be
documented
in
order
for
future
users
to
have
a
clear
understanding
of
the
data
set.

“But
the
data
are
already
public…”

So
if
the
general
understanding
that
your
twitter
data
is
open
and
public,
and
that
people
using
these

platforms
want
to
be
seen
AND
heard,
why
should
we
be
concerned
about
privacy?

The
Privacy
Policy
of
twitter
stipulates
that
while
you
“own”
your
content
–
anyone,
including
twitter
or

any
third
party,
are
given
the
right
to
access
your
data
and
re-‐use
it.
(our
reading
of
the
privacy
policy)

Those who see Twitter data as data that contains potentially identifying information about human subjects may
want to anonymize the data for the authors' protection, and may see displaying user names as unethical. This
runs contrary to Twitters Rules of the Road which require the display of a user id to give credit to the person
who tweeted.

Yet Twitter also acknowledges this public/private tension in their own policies by suggesting if there is a
concern over privacy or security risks by making a user id or other information available, the individual or
media should get in touch with them.

The
debate
about
the
capture,
reuse,
and
display
of
Twitter
data
is
the
line
between

thelegality
of

collecting
this
content
and
the
ethics
of
doing
so.

To
date
there
haven’t
been
any
formal
legal
challenges
about
the
downloading,
use
and
archiving
of

Twitter
data,
that
we
are
aware
of.

Thus
ensues
a
wide-‐ranging
debate
by
scholars
who
characterize
privacy
issues
with
social
media
data
in

the
following
ways:

Most
researchers
take
a
harm-‐based
view
of
privacy,
in
which
the
goal
is
to
protect
users’
information

from
negative
actors.

This
includes
concern
for
security
issues
(used
by
government
agencies
to
track
and
arrest;
use
as

evidence).

Recognizing
there
are
loopholes
in
the
data,
which
enables
someone
to
get
a
lot
of
information

about
an
individual,
even
if
all
you
have
is
a
username;
deletion
of
account
and
changing
from
public
to

private
content
captured
will
be
available.

Finally,
(Buyer
beware)
those
users
who
have
opted
to
make
their
accounts
public
have
no
grounds
for

complaint
about
the
collection
and
reuse
of
their
content,
even
if
they
did
not
anticipate
reuse
by

researchers
or
commercial
firms
(Thelwall,
2010;
Vieweg,
2010).

Danah
boyd
still
asks:
Just
because
we
can
collect
it,
should
we?

Michael
Zimmer,
an
Internet
Privacy
scholar,
argues
instead
for
a
dignity-‐based
view
of
privacy
that

views
the
act
of
another
person
taking
one’s
personal
information
from
the
social
networking
sphere,

amassing
into
a
database,
making
available
for
use
and
scrutiny,
is
an
affront
to
the
users’/subjects’

human
dignity
and
their
ability
to
control
the
flow
of
their
personal
information.

Finally,
What
are
the
user’s
expectations
of
how
their
tweets
will
be
used?

How
many
here
have
actually
read
Twitter’s
privacy
policy?
FB?

Do
you
understand
the
implications
of

re-‐use?

___Schmidt,
Trepte,
and
Reinecke
(2011)
observe
that
users
develop
shared
routines
and
expectations
of

self-‐disclosure,
noting
that
privacy
management
is
performed
for
a
specific
audience.

Facebook
for
example
enables
users
to
select
privacy
settings
on
a
post-‐by-‐post
basis,
choosing
who
is

able
to
read,
comment,
and
interact
with
specific
content,
and
allowing
the
user
fairly
granular
control

over
the
flow
of
their
information.
Twitter
allows
only
binary
control;
users
can
designate
their
account

as
“protected”
(i.e.
Tweets
are
only
visible
to
approved
followers),
or
“public”
(enabled
by
default),
which

makes
a
user’s
profile
and
timeline
accessible
to
anyone,
even
those
without
a
Twitter
account.

The
ethical
jury
is
going
to
be
out
on
this
for
a
while;
at
least
until
scholarly
communities
work
out

parameters
and
provide
guidance
for
acceptable
use
of
social
media
data.

In
the
meantime,
what
are
we

to
do?
Legal
and
ethical
policy
related
to
privacy
and
social
media
data
is
still
in
flux
and
almost
always

lags
behind
the
pace
of
research.
Yet
libraries,
etc
are
pressured
to
act
now
to
archive
the
data.

Data
repositories
will
be
caught
in
the
middle
of
these
divergent
viewpoints
when
trying
to
determine
the

best
methods
of
providing
access
to
the
data.

The
norms
of
individual
research
disciplines
often
provide
guidance
for
curators,
but
when
researchers

with
divergent
norms
seek
access
to
the
same
data,
it
can
be
difficult
to
determine
how
best
to
serve
the

broadest
number
of
users.

Experience
with
this
data
set
and
the
literature
review
led
us
to
the
following
recommendations

Libraries
or
other
data
repositories
will
need
to
decide
if
archiving
social
media
data
fits
with

their
overall
institutional
mission
and
goals.

Libraries
should
determine
the
overall
risks
associated
with
collecting
and
archiving
social
media

data
and
design
strategies
to
mitigate
those
risks.

Because
of
the
significance
scholars
are
placing
on
the
need
to
collect
and
now
in
our
case
archive
twitter

data,
we
are
convinced
that
providing
for
the
collection,
preservation,
and
reuse
of
social
media
data

requires
at
the
very
least
conversations
among
researchers,
libraries,
archives,
institutional
review

boards,
scholarly
societies,
and
other
national
and
international
organizations
concerned
with
the

production
and
preservation
of
scholarship.

Part
of
the
discussion
will
need
to
include
the
context
or
conditions
under
which
the
data
have
been

collected.

One
important
aspect
of
this,
as
Dr.
Charlesworth
mentioned
in
his
talk
yesterday,
is
a
way
to

gather
“legal
metadata”
so
that
going
forward
the
archive
or
repository
will
have
the
necessary
privacy
I’s

dotted
and
t’s
crossed,
in
so
much
as
it
is
possible.

Libraries
should
engage
researchers
as
early
as
possible
in
the
research
process.

curators
are
presented
with
a
golden
opportunity
to
collaborate
with
researchers
as
close
to
the

beginning
of
the
research
lifecycle.

Through
a
collaborative
process,
we
can
ideally
facilitate
the
creation

of
collections
that
balance
openness
with
privacy
concerns,
and
encourage
broad
reuse.

While
that
early
intervention
may
not
happen,
we
can
employ
curatoratorial
strategies
on
the
backend
of

the
data
gathering
will
hopefully
push
the
issue.
(one
of
which
will
be
discussed
in
our
next

recommendation.)

Here
we
start
to
addresses
the
question
somewhat
that
was
asked
yesterday
at
Dr.
Charlesworth’s

presentation
about
educating
the
researchers.

Libraries
choosing
to
archive
social
media
data
should
develop
clear
and
easy
to
use
collection

and
deposit
policies,
forms
and
tools.

It
has
been
our
argument
since
first
working
with
Twitter
Data
that
a
way
to
both
educate
researchers

and
create
ingestible,
reusable
data
into
a
repository
is
to
create
a
workflow
that
asks
the
necessary

questions
of
researchers,
which
would
aid
in
the
creation
of
a
codebook
and
documentation
for
the
data.

We
created
a
twitter
deposit
form
that
is
geared
toward
raising
the
privacy
issues
with
this
platform,

educating
the
researcher,
as
well
as
providing
a
way
to
record
the
basic
legal
and
descriptive
metadata

necessary
for
contextualizing
the
data
for
re-‐use.

Teachable
moments
for
information
literacy
librarians.

Understand
and
know
the
source
of
information.

Twitter
adds
language
to
their
privacy
policy
that
more
explicitly
state
use.

(gain
consent
–
and
then
the
data
truly
become
open)

Ideally,
Twitter
would
take
a
different
approach
to
releasing
the
data
for
research;
partner
directly
with

researchers;
rather
than
with
a
third
party
like
GNIP
which
charges
for
the
data
and
isn’t
clear
what
can

be
done
with
it
once
it’s
been
purchased.

Lastly,
thanks
to
all
who
have
been
tweeting
during
the
session;
we
wanted
to
let
you
know
that
we’ve

archived
them
in
TwapperKeeper.

What Your Tweets Tell Us About You, Speaker Notes

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to What Your Tweets Tell Us About You, Speaker Notes

Similar to What Your Tweets Tell Us About You, Speaker Notes (20)

Recently uploaded

Recently uploaded (20)

What Your Tweets Tell Us About You, Speaker Notes