SlideShare a Scribd company logo
1 of 44
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Weaponized Web Archives:
Provenance Laundering of Short Order Evidence
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
Supported in part by The Andrew Mellon Foundation.
Opinions expressed are those of the presenter.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
TL;DR
• We are on the cusp of a
“Photoshop” moment for
synthesizing convincing
audio/video
• Web archives will be
weaponized to:
– alter trustworthy content
– obfuscate provenance of
untrustworthy content
web archives
https://imgur.com/gallery/akeVeiq
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
as a community, we constantly are asking ourselves:
“Are we creating tools that aid the
surveillance state?”
Spoiler alert: Yes.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Our attitude about the
surveillance state is contextual.
https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/
http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says-
releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html
Boston Marathon Bombing, 2013
https://twitter.com/charliespiering/status/976430395964215296
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Given enough time, it becomes art
https://archive.org/details/prelingerhomemovies
https://genius.com/Dj-shadow-letter-from-home-lyrics
https://www.youtube.com/watch?v=MIR62rreRKY
personally
identifiable
information!
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Meanwhile, we happily pay monthly
service fees to be surveilled!
https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/
https://twitter.com/mtdukes/status/974281625348558848
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
“Quis custodiet ipsos custodes?”
A: Social media.
https://twitter.com/WIRED/status/958350367468683267 https://twitter.com/vicenews/status/670059493581959168
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
We don’t feel too bad when we archive accounts that
later prove to be trolls / sockpuppets / sybils
https://twitter.com/safety_refinery/status/934982022078042112
https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html
https://twitter.com/documentnow/status/964882665982722048
https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Nor do we feel bad for holding
public figures / organizations accountable
https://twitter.com/landlibrarian/status/975910915135754240
https://twitter.com/IEEEhistory/status/960358528987942912
http://archive.is/xh58B
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
We can & should discuss our role in surveillance,
but realize Facebook is operating as designed
(and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram)
https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624
see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
as a community, we should be asking ourselves:
“Can we authenticate web content?”
Spoiler alert: Yes. A bit.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Granted, we’ve had obvious, cut-n-paste /
mashup “evidence” for a long time…
Victorian Photo Collage
https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage
“The Flying Saucer” (1956)
https://en.wikipedia.org/wiki/The_Flying_Saucer_(song)
https://www.youtube.com/watch?v=XCrn6QXvHLg
Brian Williams Raps ‘Gin & Juice’
https://www.youtube.com/watch?v=XlGLhYFrv6w
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Crude techniques = humor,
sophisticated techniques = deception;
Brand’s prediction of “any day now” is now
Synthesizing Obama: Learning Lip Sync from Audio
SIGGRAPH 2017
https://grail.cs.washington.edu/projects/AudioToObama/
Face2Face: Real-time Face Capture and Reenactment
of RGB Videos, CVPR 2016
http://niessnerlab.org/projects/thies2016face.html
see also: https://www.youtube.com/watch?v=pkkph4JhrCg
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
What does this have to do with the web?
Clumsy, “collage/flying saucer/gin & juice” techniques
are already effective on social media
We are completely unprepared for
advanced, SIGGRAPH/CVPR techniques
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Neo-Nazis and “Black Panther”
Relationship Status: It’s Complicated
http://knowyourmeme.com/photos/1338390-black-panther
https://twitter.com/TamikaDMallory/status/964701120194019328
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
nydailynews.com provides screenshots,
but not links to the tweets…
http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
@AsianWifeHaver and @DSA_Boi_Pucci
are not on the live web…
$ curl -I https://twitter.com/AsianWifeHaver
HTTP/1.1 302 Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 103
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:09:27 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:09:27 GMT
location: https://twitter.com/account/suspended
$ curl -I https://twitter.com/AsianWifeHaver
HTTP/1.1 302 Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 103
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:09:27 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:09:27 GMT
location: https://twitter.com/account/suspended
$ curl -I https://twitter.com/DSA_Boi_Pucci
HTTP/1.1 404 Not Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 6329
content-security-policy: [deletia]
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:14:22 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:14:22 GMT
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
…nor are they in the Internet Archive
note: this exists only
because of the redirection
to the “suspended” page
http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver
http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Can’t find @DSA_Boi_Pucci in any archive
Typical archive URI construction:
archive.example.org/SomeString/CNN.com/travel
web.archive.org/web/*/twitter.com/DSA_Boi_Pucci
wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci
perma-archives.org/warc/twitter.com/DSA_Boi_Pucci
archive.is/twitter.com/DSA_Boi_Pucci
www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci
wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci
arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci
for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
What if we checked these archives?
What if they all agreed?
breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci
infowars.com/web/*/twitter.com/DSA_Boi_Pucci
iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci
InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci
Would you trust the results?
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Our entire national digital preservation
strategy is predicated on
Brewster Kahle “not being evil”™
If he is leading a 20+ year sleeper cell, we’re doomed.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Segal’s Law, restated for web archives:
The person with an archive knows what the page looked like.
The person with two archives is never sure.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
However, even with a single web archive,
there can be problems:
zombies, temporal violations, and attacks
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Zombies: live web “leaking” into an archived page
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
this page is
from 2008
this ad is
from 2012
(when this
screen shot
was taken)
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Temporal violations: reconstructing legitimately
archived resources into a page that never existed
http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
text (2004-12)
says rain,
image (2005-09)
is clear
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Directly attacking the archive
(in this case, via orphaned live web resources; “zombie attack”)
Lerner, Kohno, Roesner, 2017
https://doi.org/10.1145/3133956.3134042
see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html
page is from 2011,
iframe content is from 2017
(when screenshot was taken)
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Based on feedback from Lerner et al.,
IA has changed their playback
(specifically, with a Content-Security-Policy HTTP response header)
But playback remains problematic…
(apologies to Peter Arnett)
“In order to save the page, we had to completely change it”
let’s look at four common scenarios
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
1) JavaScript does not run correctly from the archive
http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html
This is cnn.com not replaying;
it hasn’t replayed correctly since
November 1, 2016
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
2) Archived page renders differently each time
Mohamed Aturban, unpublished, memento:
http://web.archive.org/web/20130724144801/http://www.cnn.com/
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
3) Archive modifies pages that should stay the same –
goodbye conventional fixity checks!
Mohamed Aturban, unpublished, embedding memento:
http://perma-archives.org/warc/20170101182813/http://umich.edu/
http://perma-archives.org/warc/20170101182814id_/http://umich.edu/includes/image/type/gallery/id/113/name/ResearchDIL-19Aug14_DM%28136%29.jpg/width/152/height/152/mode/minfit/
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
4) Archived page doesn’t match live web experience
https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change
http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html
“only a ‘crisis actor’ would
tweet in Slovak!”
Now imagine she gets fed up,
deletes her account, and then
someone applies the
“abandoned acct / archive” attack
Justin Littman described:
https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
How can we differentiate between “normal”
modification for playback vs. deception?
These might have been swapped -- but how can you tell for sure?
If the tweets or accts are deleted, we don’t know.
If I embed fake tweets in another page, it’s even more confusing.
And it is not in Twitter’s (perceived) self-interest to help, cf.:
https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
You cannot trust the URL in your browser!
Here’s an actual page in the IA “proving”
Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg.
John Berlin, MS Thesis, 2018
https://www.youtube.com/watch?v=k3QTcJZdFfs
(actual URI-R & URI-M have also been faked in video)
The content is clearly fake, but imagine replacing:
1)“1992” with a more believable “2016”,
2)the fake domain with “bbc.com”, and
3)Brian Williams rapping with a synthesized Trump or Obama speech.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Blockchain to the rescue!!!
<lasers>
<sirens>
<disco-thumping-soundtrack>
nope.
https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/
https://eprint.iacr.org/2017/375.pdf
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Instead, let’s use web archives
to monitor web archives.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Step 1: Push to multiple archives
web.archive.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180321/eaw.rhizome.org
archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Step 2: Compute fixity,
publish fixity “manifest” at a well-known location
manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org
manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org
manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org
manifest.org/20180322/archive.is/20180321/eaw.rhizome.org
It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that
should not change, like JPEGs and certain original HTTP response headers.
This example assumes the existence of a well-known server manifest.org.
Actual URIs can be a bit more complex using “Trusty URIs”:
http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Wondering about veracity of an archived page?
Check manfiest.org and recompute fixity.
manifest.org/20180322/web.archive.org/
web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org
what if manifest.org is down?
or possibly hacked?
We can’t know archive.org did not alter contents on ingest (20180321),
but we can verify that it has not changed since our observation (20180322)
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Step 4: Push manifest to multiple archives
web.archive.org/web/20180323/manifest.org/20180322/web.archi
ve.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180323/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome
.org
Now the 20180322 version of the manifest of archive.org’s
memento of rhizome.org is in four different archives.
The URIs are ugly, but the bottom line is an attacker would have to hack a
majority of 5 domains (manifest.org + 4 archives)
Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Wondering about veracity of an archived page?
Check all copies of manfiest.org and take a majority
vote
manifest.org/20180322/web.archive.org/
web/20180321/eaw.rhizome.org
web.archive.org/web/20180321/eaw.rhizome.org
Caveat 1: If I can hack rhizome.org page at archive.org, I can probably hack the
fixity info there too, so we really have 4 copies not 5.
web.archive.org/web/20180323/manifest.org/20180322/web.arch
ive.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180323/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
Caveat 2: archive.org and archive-it.org are not independent,
so we really have 3 copies not 5.
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
No fixity information?
Maybe it’s ok, maybe it’s not.
infowars.com/web/20180321/eaw.rhizome.org
404
404
404
404
404
or perhaps fixity was computed and stored at freedomfries.org;
you have to decide if you trust that site.
see also: https://www.youtube.com/watch?v=EY15lj-7_lc
http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln
Conclusions
• Bad news:
– The web will be the primary vector for increasingly
sophisticated disinformation
– Web archives can be used to forge or obscure the
provenance of this information
– Brian Williams predates Snoop Dogg
• Good news:
– Web archives have a role in authenticating who said what,
and when
– We should have a web archiving presence at: June 7-8,
2018, NYC: https://www.fakenewshorrorshow.org/
–

More Related Content

What's hot

Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
UMASL Search Like a Pro
UMASL Search Like a ProUMASL Search Like a Pro
UMASL Search Like a Pro
bsdesantis
 
Reading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSEReading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSE
Jen LaMaster
 
Fact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and HacksFact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and Hacks
Julian Ausserhofer
 
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
Matilde Fontanin
 
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Illuminating  Learning Communities Through School Libraries and MakerspacesC...Illuminating  Learning Communities Through School Libraries and MakerspacesC...
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Buffy Hamilton
 

What's hot (20)

We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web Packaging
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
UMASL Search Like a Pro
UMASL Search Like a ProUMASL Search Like a Pro
UMASL Search Like a Pro
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
 
Reading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSEReading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSE
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Fact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and HacksFact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and Hacks
 
I know how to search the internet,
I know how to search the internet,I know how to search the internet,
I know how to search the internet,
 
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
 
Bridging the divide - a social media workshop
Bridging the divide - a social media workshopBridging the divide - a social media workshop
Bridging the divide - a social media workshop
 
IACE-T Presentation
IACE-T PresentationIACE-T Presentation
IACE-T Presentation
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Illuminating  Learning Communities Through School Libraries and MakerspacesC...Illuminating  Learning Communities Through School Libraries and MakerspacesC...
Illuminating Learning Communities Through School Libraries and Makerspaces C...
 
What Is Web 20?
What Is Web 20?What Is Web 20?
What Is Web 20?
 

Similar to Weaponized Web Archives: Provenance Laundering of Short Order Evidence

Gov 2.0 - Trust and Transparency
Gov 2.0 - Trust and TransparencyGov 2.0 - Trust and Transparency
Gov 2.0 - Trust and Transparency
Martin Boyce
 
NM Department of Agriculture
NM Department of AgricultureNM Department of Agriculture
NM Department of Agriculture
Julia Parra
 
FOIPOP Presentation Richard Rosenberg
FOIPOP Presentation Richard RosenbergFOIPOP Presentation Richard Rosenberg
FOIPOP Presentation Richard Rosenberg
gueste0950
 

Similar to Weaponized Web Archives: Provenance Laundering of Short Order Evidence (20)

Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Libraries And Technology (a.k.a. You Snooze, You Lose!)
Libraries And Technology (a.k.a. You Snooze, You Lose!)Libraries And Technology (a.k.a. You Snooze, You Lose!)
Libraries And Technology (a.k.a. You Snooze, You Lose!)
 
EdTech 2009: Using Wikipedia
EdTech 2009: Using WikipediaEdTech 2009: Using Wikipedia
EdTech 2009: Using Wikipedia
 
Gov 2.0 - Trust and Transparency
Gov 2.0 - Trust and TransparencyGov 2.0 - Trust and Transparency
Gov 2.0 - Trust and Transparency
 
NMC Horizon Report: 2013 Museum Edition Presentation
NMC Horizon Report: 2013 Museum Edition PresentationNMC Horizon Report: 2013 Museum Edition Presentation
NMC Horizon Report: 2013 Museum Edition Presentation
 
Copyright for Educators
Copyright for EducatorsCopyright for Educators
Copyright for Educators
 
The 21st Century Kid
The 21st Century KidThe 21st Century Kid
The 21st Century Kid
 
@twitter Mining #Microblogs Using #Semantic Technologies
@twitter Mining #Microblogs Using #Semantic Technologies@twitter Mining #Microblogs Using #Semantic Technologies
@twitter Mining #Microblogs Using #Semantic Technologies
 
Swap2010 twitter minining using semantic web technologies and linked data
Swap2010 twitter minining using semantic web technologies and linked dataSwap2010 twitter minining using semantic web technologies and linked data
Swap2010 twitter minining using semantic web technologies and linked data
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
 
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
The startup of you : Build your digital identity
The startup of you : Build your digital identityThe startup of you : Build your digital identity
The startup of you : Build your digital identity
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
People as sensors - mining social media for meaningful information
People as sensors - mining social media for meaningful informationPeople as sensors - mining social media for meaningful information
People as sensors - mining social media for meaningful information
 
Challenging Web 2.0
Challenging Web 2.0Challenging Web 2.0
Challenging Web 2.0
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
NM Department of Agriculture
NM Department of AgricultureNM Department of Agriculture
NM Department of Agriculture
 
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
Butterfly Hunt: On Collecting #mla14 Tweets (#mla15 #s398)
 
FOIPOP Presentation Richard Rosenberg
FOIPOP Presentation Richard RosenbergFOIPOP Presentation Richard Rosenberg
FOIPOP Presentation Richard Rosenberg
 

More from Michael Nelson

More from Michael Nelson (18)

Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Recently uploaded

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

  • 1. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation. Opinions expressed are those of the presenter.
  • 2. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln TL;DR • We are on the cusp of a “Photoshop” moment for synthesizing convincing audio/video • Web archives will be weaponized to: – alter trustworthy content – obfuscate provenance of untrustworthy content web archives https://imgur.com/gallery/akeVeiq
  • 3. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln as a community, we constantly are asking ourselves: “Are we creating tools that aid the surveillance state?” Spoiler alert: Yes.
  • 4. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Our attitude about the surveillance state is contextual. https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/ http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says- releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html Boston Marathon Bombing, 2013 https://twitter.com/charliespiering/status/976430395964215296
  • 5. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Given enough time, it becomes art https://archive.org/details/prelingerhomemovies https://genius.com/Dj-shadow-letter-from-home-lyrics https://www.youtube.com/watch?v=MIR62rreRKY personally identifiable information!
  • 6. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Meanwhile, we happily pay monthly service fees to be surveilled! https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/ https://twitter.com/mtdukes/status/974281625348558848
  • 7. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln “Quis custodiet ipsos custodes?” A: Social media. https://twitter.com/WIRED/status/958350367468683267 https://twitter.com/vicenews/status/670059493581959168
  • 8. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln We don’t feel too bad when we archive accounts that later prove to be trolls / sockpuppets / sybils https://twitter.com/safety_refinery/status/934982022078042112 https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html https://twitter.com/documentnow/status/964882665982722048 https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
  • 9. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Nor do we feel bad for holding public figures / organizations accountable https://twitter.com/landlibrarian/status/975910915135754240 https://twitter.com/IEEEhistory/status/960358528987942912 http://archive.is/xh58B
  • 10. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln We can & should discuss our role in surveillance, but realize Facebook is operating as designed (and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram) https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624 see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
  • 11. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln as a community, we should be asking ourselves: “Can we authenticate web content?” Spoiler alert: Yes. A bit.
  • 12. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
  • 13. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Granted, we’ve had obvious, cut-n-paste / mashup “evidence” for a long time… Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  • 14. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Crude techniques = humor, sophisticated techniques = deception; Brand’s prediction of “any day now” is now Synthesizing Obama: Learning Lip Sync from Audio SIGGRAPH 2017 https://grail.cs.washington.edu/projects/AudioToObama/ Face2Face: Real-time Face Capture and Reenactment of RGB Videos, CVPR 2016 http://niessnerlab.org/projects/thies2016face.html see also: https://www.youtube.com/watch?v=pkkph4JhrCg
  • 15. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln What does this have to do with the web? Clumsy, “collage/flying saucer/gin & juice” techniques are already effective on social media We are completely unprepared for advanced, SIGGRAPH/CVPR techniques
  • 16. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Neo-Nazis and “Black Panther” Relationship Status: It’s Complicated http://knowyourmeme.com/photos/1338390-black-panther https://twitter.com/TamikaDMallory/status/964701120194019328
  • 17. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln nydailynews.com provides screenshots, but not links to the tweets… http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
  • 18. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln @AsianWifeHaver and @DSA_Boi_Pucci are not on the live web… $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/DSA_Boi_Pucci HTTP/1.1 404 Not Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 6329 content-security-policy: [deletia] content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:14:22 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:14:22 GMT
  • 19. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln …nor are they in the Internet Archive note: this exists only because of the redirection to the “suspended” page http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
  • 20. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln
  • 21. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Can’t find @DSA_Boi_Pucci in any archive Typical archive URI construction: archive.example.org/SomeString/CNN.com/travel web.archive.org/web/*/twitter.com/DSA_Boi_Pucci wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci perma-archives.org/warc/twitter.com/DSA_Boi_Pucci archive.is/twitter.com/DSA_Boi_Pucci www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
  • 22. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln What if we checked these archives? What if they all agreed? breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci infowars.com/web/*/twitter.com/DSA_Boi_Pucci iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci Would you trust the results?
  • 23. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Our entire national digital preservation strategy is predicated on Brewster Kahle “not being evil”™ If he is leading a 20+ year sleeper cell, we’re doomed.
  • 24. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Segal’s Law, restated for web archives: The person with an archive knows what the page looked like. The person with two archives is never sure.
  • 25. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln However, even with a single web archive, there can be problems: zombies, temporal violations, and attacks
  • 26. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken)
  • 27. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html text (2004-12) says rain, image (2005-09) is clear
  • 28. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Directly attacking the archive (in this case, via orphaned live web resources; “zombie attack”) Lerner, Kohno, Roesner, 2017 https://doi.org/10.1145/3133956.3134042 see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html page is from 2011, iframe content is from 2017 (when screenshot was taken)
  • 29. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Based on feedback from Lerner et al., IA has changed their playback (specifically, with a Content-Security-Policy HTTP response header) But playback remains problematic… (apologies to Peter Arnett) “In order to save the page, we had to completely change it” let’s look at four common scenarios
  • 30. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 1) JavaScript does not run correctly from the archive http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html This is cnn.com not replaying; it hasn’t replayed correctly since November 1, 2016
  • 31. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 2) Archived page renders differently each time Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/
  • 32. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 3) Archive modifies pages that should stay the same – goodbye conventional fixity checks! Mohamed Aturban, unpublished, embedding memento: http://perma-archives.org/warc/20170101182813/http://umich.edu/ http://perma-archives.org/warc/20170101182814id_/http://umich.edu/includes/image/type/gallery/id/113/name/ResearchDIL-19Aug14_DM%28136%29.jpg/width/152/height/152/mode/minfit/
  • 33. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 4) Archived page doesn’t match live web experience https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  • 34. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln How can we differentiate between “normal” modification for playback vs. deception? These might have been swapped -- but how can you tell for sure? If the tweets or accts are deleted, we don’t know. If I embed fake tweets in another page, it’s even more confusing. And it is not in Twitter’s (perceived) self-interest to help, cf.: https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
  • 35. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln You cannot trust the URL in your browser! Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been faked in video) The content is clearly fake, but imagine replacing: 1)“1992” with a more believable “2016”, 2)the fake domain with “bbc.com”, and 3)Brian Williams rapping with a synthesized Trump or Obama speech.
  • 36. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf
  • 37. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Instead, let’s use web archives to monitor web archives.
  • 38. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Step 1: Push to multiple archives web.archive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180321/eaw.rhizome.org arquivo.pt/wayback/20180321/eaw.rhizome.org archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
  • 39. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Step 2: Compute fixity, publish fixity “manifest” at a well-known location manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org manifest.org/20180322/archive.is/20180321/eaw.rhizome.org It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that should not change, like JPEGs and certain original HTTP response headers. This example assumes the existence of a well-known server manifest.org. Actual URIs can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  • 40. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Wondering about veracity of an archived page? Check manfiest.org and recompute fixity. manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org what if manifest.org is down? or possibly hacked? We can’t know archive.org did not alter contents on ingest (20180321), but we can verify that it has not changed since our observation (20180322)
  • 41. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Step 4: Push manifest to multiple archives web.archive.org/web/20180323/manifest.org/20180322/web.archi ve.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome .org Now the 20180322 version of the manifest of archive.org’s memento of rhizome.org is in four different archives. The URIs are ugly, but the bottom line is an attacker would have to hack a majority of 5 domains (manifest.org + 4 archives) Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
  • 42. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Wondering about veracity of an archived page? Check all copies of manfiest.org and take a majority vote manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.org web.archive.org/web/20180321/eaw.rhizome.org Caveat 1: If I can hack rhizome.org page at archive.org, I can probably hack the fixity info there too, so we really have 4 copies not 5. web.archive.org/web/20180323/manifest.org/20180322/web.arch ive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org Caveat 2: archive.org and archive-it.org are not independent, so we really have 3 copies not 5.
  • 43. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln No fixity information? Maybe it’s ok, maybe it’s not. infowars.com/web/20180321/eaw.rhizome.org 404 404 404 404 404 or perhaps fixity was computed and stored at freedomfries.org; you have to decide if you trust that site. see also: https://www.youtube.com/watch?v=EY15lj-7_lc http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  • 44. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Conclusions • Bad news: – The web will be the primary vector for increasingly sophisticated disinformation – Web archives can be used to forge or obscure the provenance of this information – Brian Williams predates Snoop Dogg • Good news: – Web archives have a role in authenticating who said what, and when – We should have a web archiving presence at: June 7-8, 2018, NYC: https://www.fakenewshorrorshow.org/ –