This document summarizes a university press's experience with digital piracy of its publications over three years. It details that the press issued 415 takedown notices to 45 piracy sites regarding 396 illegally hosted files and 206 book titles. The majority of pirated files were hosted on cyberlockers like Rapidshare and Filefactory. Over three years the press issued over 11,000 takedown notices with an 82% success rate at removing infringing content. The document also discusses various anti-piracy strategies the press employs like invisible watermarking and escalating enforcement measures against non-compliant sites.
13. • DMCA Notices sent
to infringement site
• Link Submission to
Google for Removal
from SERP
• Escalation to hosts
“legally obligated to
remove content if they
wish to retain Safe
harbor under DMCA”
• Can improve overall
responsiveness of
otherwise non-
responsive site
• Site Enablers
• Payment services
• Ad Networks
• Domain registrars
• Certificate authorities
• Each of these providers
has language in their
terms of service
agreement that requires
the customer (site) not to
engage in illegal activity
• Failure to comply can
result in suspension of
account
14. • Over three years:
• 11,683 notices
• 9,670 successful
• 82%
17. • Watermarking for text
• “Social” Watermark
• Invisible Text Watermarking
• Embedded into publication
• Pre-production
• Point of sale
• Both
• Embedded into Guardian
• Source of pirated material
18. Mike Schwartz
Contracts Copyright and Permissions Supervisor
Princeton University Press
P: (503) 894-8934
E: mike_schwartz@press.princeton.edu
Editor's Notes
Good afternoon, everyone.My name is Mike Schwartz; I’m the Contracts Copyright and Permissions supervisor at Princeton University Press, and I would like to welcome you to this little journey through the murky waters of search and takedown…so please - prepare your spreadsheets, bring plenty of DMCA takedown notices, and oh yeah, don’t forget to disable your firewalls! (Click through)
So, our journey begins in August of 2008. I was an assistant at the time, mere weeks into my tenure at PUP, when I was informed that an author of ours, Timothy Gowers, had discovered a pirated copy of his and our much-anticipated Princeton Companion to Mathematics (click). Now, we take all of our author concerns seriously, but this particular case yielded extra cause for concern: not only was the book available for free download, it wasn’t yet published. We hadn’t really seen anything like this before, and as the Intellectual Property Assistant, I was tasked with determining just how big a problem we had on our hands. (Click through)
To begin, let’s take a look at just how easy it is to search for and download a pirated eBook. Having little to no experience with this realm myself – as a product of the late 90’s and early 2000’s, my idea of piracy downloading music on Napster and Limewire – but I quickly discovered that the process of downloading free books was incredibly simple (Click). The indexing site I chose for my example is Avaxhome, and as you can see, it looks not unlike a regular ol’ search engine, with user-friendly parameters like (click) title (click) and author. And the results (click) are as simple as clicking through a Google search. They even earn ad revenue! (Click)(Click through)
Once we click through a link, it brings up what I call the “book page”, which acts as an interface between the indexing site and the cyberlocker, and they are often pretty detailed(click)Here’s the cover (click) The size of the file (click)A myriad of download links (click)You can see that nobody paid attention to the “no mirrors” instruction, (click) because, you know, when you’re breaking the rules, you have to still have to follow the rules, right? And look, (click) they even have a sad kitten asking for donations
Ironically, it doesn’t take a geniusto download scholarly work. We’re at least smart enough to know not to pay for premium access (click) – so we’ll stick with the free (click). It may be slower, and you can only download a limited quantity in one 24-hour period, but it’s free, and that’s the whole point.(Click through)(15s)
The result? Crisp, clean pirated books that need to be removed. (Click through)
The next step is to send a DMCA takedown notice to the conveniently-provided DMCA Agent that so many of the major cyberlockers make available (click). In essence, the sites know they’re harboring abuse, but instead of dealing with it internally, they leave the search and takedown up to everyone else, and it’s our job (click) to get in touch via their abuse@ email address (click)
Roughly five months into my search and takedown odyssey - devoting at least two hours a week exclusively to search and takedown – the statistics started to pile up. Most notably, (click) the 415 notices sent resulted in 378 takedowns, an efficiency rate of roughly 91%. In spite of that, the overall results weren’t pretty (click through)
This is but a small sampling of the 200-line spreadsheet I created to house all of my search and takedown information. Everything was just added as it was found, and trust me – you’re not alone if you think this tough to read. So let’s try to distill this a bit (click). Here’s our friend “The Companion to Mathematics” again….and (click). And look at the list of sites on which I found it (click). The indexing sites are underlined – and the cyberlockers are in bold. This all equates to ten successful downloads in five months. The funny thing is, each of these downloads yielded the same exact pirated file. See, that’s the brilliance, if you will, of these pirates – it’s all just one file being passed around in a big game of cyber keep-away. Or as we started calling it, “Whack-a-Mole”, because as you well know, if you knock one down, two more pop up. And in the world of Piracy, they pop up in the strangest places, like (click) .cd, the Democratic Republic of Congo, or (click) .ws, Western Samoa. (Click through)
But let’s backtrack for a second. Sure, there were scores, even hundreds of titles and files. But even those numbers didn’t raise eyebrows quite like what I started finding inside…like. (pause, emphasis) (click)Typesetting brackets (click) author notes (click) (click through)
Or even watermarks! (click) (click through)
So flash forward a couple of years, toa sweltering hot day in early September, 2010.I made my way to the then-under construction – and if memory serves me, not air conditioned - AAUP HQ in New York, to meet two individuals: Jim Pitkow and Matt Robinson, respectively the CEO/co-founder and General Council of a Silicon Valley counter-infringement service called Attributor. PUP had since graduated from my lovely spreadsheet, and was trialing with the Publishers Association’s Copyright Infringement Portal, which was and still is a helpful and user-friendly tool, with only one discernible hitch: while the takedown process was automated and the notices region-specific, the search end of the equation – aka the most difficult part – was still entirely up to the user. Attributor’s primary selling point was automation - or should I say, the capacity to automatically crawl the vast and expansive internet at a rate far greater than a single human, or even a team of humans, could accomplish. At risk of sounding like an Attributor infomercial, I’d like share with you a few of the statistics that eventually sold PUP on signing up. Just right off the bat, they have (click) 100 dedicated servers that crawl (click) the top-35 cyberlockers like (click) Rapidshare (click) and Megaupload. (Click) Eight hundred to one thousand Torrent sites (click) like the famous Pirate Bay. And (click) one hundred million “other” sites (click) like blogs. (Click) Using region-specific takedown notices valid in 129 copyright jurisdictions worldwide, they claim to have a near-100% success rate.(Click) This is because each suspected infringement is manually vetted (click) along with an automatic compliance check. (Click) Once the infringement is verified, the host is bombarded with those region-specific takedown notices (click). Data is stored and reported in real-time on their “Guardian” page, which I’ll show in a minute. (Click) But first, we need to talk about (click) escalation, because compliance isn’t always a one-and-done deal (Click through)
As a rabid Portland Timbers fan, I would be remiss if I didn’t use at least a little soccer terminology, and in the world of search and takedown, the escalation process can look a lot like a referee handing an unruly player. The first round of takedown protocol is to (click) send DMCA notices to the infringement site itself, and to submit the link to Google, with whom Attributor has partnered in this endeavor, to remove infringing links from the Search Engine Results Page, or “SERP”, it’s a solid yellow card. If that doesn’t work (click) hosts are reminded that they are legally obligated to remove content if they wish to retain safe harbor under the DMCA – the old “hey buddy, if you keep this up, you’ll be outta here”.Further noncompliance leads to direct contact with the infringing site’s enablers (click) – their payment services, ad networks, domain registrars, all of the services that act as the site’s lifeblood. The result can be the suspension of the site’s account, which is like having your team’s best player, the heart and soul, shown the red card and sent off – it’s very difficult to play, and almost impossible to win afterward. (Click through)
As I mentioned a couple of slides ago, we did end up signing on with Attributor. (Click) And not to sneak in a spoiler or anything, but we’re now into the fourth year of our partnership, so things have actually worked out well…but let’s go back to being skeptics for a minute. A 100% success rate claim is quite a statement, especially to anyone that’s ever been frustrated with an infringing site’s utter disregard for your takedown letters. Taking a look at the stats (click), you can see that over a period of a month, (click) 303 of the 449 notices sent out resulted in successful takedowns. For argument’s sake, (click) if we add the 106 pending claims as successful, it gives us a rate of 91%. (click) Over 3 years of data sampled for this report, 9670 out of 11683 notices have resulted in successful takedowns, a rate of 82%.
Now, wait a minute. My manual takedown efficiency was 91%, equal to what Attributor did for us in that month sampling, and far superior to the 82% overall return. But really, there’s a perfectly good explanation. (Click) Here’s a graphic representation of the percentages of delinquent sites (double click). Right off the bat (click) you can see that three of the top four delinquents are derivatives of the same site, libgen – totaling (click) 854 notices, with (click) only a 9% success rate. (Click) That’s over 42% of the unsuccessful attempts. One delinquent can really ruin your statistics! The remainder are torrent sites, the most notorious of the non-responders, and websites with domains registered in creative locations. You saw Congo and Western Samoa, so why not (click) Armenia too? In short, Attributor travels in much murkier waters. None of my 91% successful takedowns were sent to Torrent sites. They simply go after sites we can’t touch, and even a few small victories are significant. (click through).
Another neat feature is the ability to know where piracy is being found, and subsequently to where takedown notices are being sent. If you could take a second to think to yourself which region or country is the most culpable….(click) did any of you guess right? I, for one, was surprised. (Click through)
This whole experience was catalyzed by the discovery of an important NYP book (click), and if we’re left with one major unanswered question, it is: where does all of this piracy come from? Are legally-purchased files being cracked, exemplified by the crisp, clean copies we find, or are the leaks internal, exemplified by the type settings and author notes I showed you earlier? The answers may finally be coming. In December of last year, Attributor was acquired by Digimarc, a digital watermarking service based out of Beaverton, Oregon who had previously been working primarily with copyrighted images. The plan is to adapt that system to the text based (click) world by implementing two kinds of watermarks. (Click) “Social” watermarking, a visible watermark with customer data displayed on each page, or (click) invisible watermarking, which are readable only through Digimarc’s proprietary watermarking reader. Either watermark (click) can be applied during pre-production (click) at the point of sale (click) or my favorite, (click) both. This system, which is scheduled to go live in the third or fourth quarter of this year, (click) embeds data into Guardian, providing insight and analysis into – hopefully – the source of pirated material. Thank you very much! (click)