Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content.
This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy.
We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.
From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model
1. 1
From publisher to platform
How the guardian used content, search, and open source to
build a powerful new business model
Stephen Dunn, Guardian News and Media
Apache Lucene EuroCon 21 May 2010
3. We started a long
time ago:
Apache Lucene EuroCon 21 May 2010
4. “To secure the financial and editorial independence of
To secure the financial and editorial
The Guardian in perpetuity.”
independence of the Guardian in perpetuity.
Topromote freedom in the press press and liberal
“To promote freedom in the and liberal journalism
journalism globally.
globally.”
Apache Lucene EuroCon 21 May 2010
6. 2010
Keyword page
Live blogs
iPhone app Mobile site
Twitter updates
Swine flu Comment
Content partnerships
Newspapers
Audio
Video Data API
Apache Lucene EuroCon 21 May 2010
11. 2009
1.5M pages
and counting
250M+ pages/
month
30M visitors/
month
4x Webby
award winner
(best
newspaper
site)
Apache Lucene EuroCon 21 May 2010 9
12. 2009
1.5M pages
and counting
250M+ pages/
month
30M visitors/
month
4x Webby
award winner
(best
newspaper
site)
Apache Lucene EuroCon 21 May 2010 9
13. 2009
1.5M pages
and counting
250M+ pages/
month
30M visitors/
month
4x Webby
award winner
(best
newspaper
site)
Apache Lucene EuroCon 21 May 2010 9
14. 2009
1.5M pages
and counting
250M+ pages/
month
30M visitors/
month
4x Webby
award winner
(best
newspaper
site)
Apache Lucene EuroCon 21 May 2010 9
15. Part of the Web
Apache Lucene EuroCon 21 May 2010 10
16. 1. Permanent
http://www.flickr.com/photos/fstorr/
• “A cool URI is one that does not change” Tim Berners-Lee 1998
• 1.5 million resources redirected to new scheme
Apache Lucene EuroCon 21 May 2010 11
17. 2. Addressable
★ Resources are “about” something - ready for the
social web.
★ We live in “the age of point-at-things” (Coates 2005)
Apache Lucene EuroCon 21 May 2010 12
18. 3. Discoverable
★ Multiple routes
to content
★ Tagging drives
discovery
Apache Lucene EuroCon 21 May 2010 13
19. 3. Discoverable
★ Multiple routes
to content
★ Tagging drives
discovery
Apache Lucene EuroCon 21 May 2010 13
20. 3. Discoverable
★ Multiple routes
to content
★ Tagging drives
discovery
Apache Lucene EuroCon 21 May 2010 13
21. 3. Discoverable
★ Multiple routes
to content
★ Tagging drives
discovery
Apache Lucene EuroCon 21 May 2010 13
29. Site traffic growth Final Release
Unique Users
First release
Apache Lucene EuroCon 21 May 2010 17
30. Site traffic growth Final Release
Unique Users
30,000,000
26,250,000 First release
22,500,000
Unique Users
18,750,000
15,000,000
11,250,000
7,500,000
3,750,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Apache Lucene EuroCon 21 May 2010 17
31. Site traffic growth Final Release
Unique Users
30,000,000
26,250,000 First release
22,500,000
Unique Users
Pre - project
18,750,000
15,000,000
11,250,000
7,500,000
3,750,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Apache Lucene EuroCon 21 May 2010 17
32. Site traffic growth Final Release
Unique Users
30,000,000
26,250,000 First release
22,500,000
Unique Users
Pre - project
18,750,000
15,000,000
11,250,000
36M
7,500,000
3,750,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Apache Lucene EuroCon 21 May 2010 17
38. ....”How I stopped
worrying about
my website and
learned to love
the whole
Internet.”
Matt McAlister
Apache Lucene EuroCon 21 May 2010 23
39. The Open Strategy
OPEN IN OPEN OUT
Bring in data and Enable partners to
apps from the build applications
Internet using Guardian
content and services
for other digital
platforms
Apache Lucene EuroCon 21 May 2010 24
43. "Our most interesting experiments lie in combining
what we know with the experience, opinions and
expertise of the people who want to participate
rather than passively receive.”
Apache Lucene EuroCon 21 May 2010 28
44. TA
BE
The Open Platform
Apache Lucene EuroCon 21 May 2010 29
45. TA
BE
OPEN IN OPEN OUT
Bring in data and apps Allow partners to build
from the Internet applications using
Guardian content and
services for other digital
platforms
Apache Lucene EuroCon 21 May 2010 30
46. TA
BE
OPEN IN OPEN OUT
Bring in data and apps Allow partners to build
from the Internet applications using
Guardian content and
services for other digital
platforms
Apache Lucene EuroCon 21 May 2010 30
47. TA
BE
The suite of services
enabling partners to build
applications with the
Guardian
Apache Lucene EuroCon 21 May 2010 31
49. TA
BE
CONTENT API DATA STORE POLITICS API
A service for A directory of Open database of
selecting and useful data candidates, voting
collecting content curated by records,
from the Guardian Guardian constituencies,
editors election results,
for re-use
live data on
election day
Apache Lucene EuroCon 21 May 2010
50. TA
BE
Your App Here!
CONTENT API
A service for selecting REST API
and collecting content
from the Guardian for
re-use
Search engine
CMS
Guardian
database
Apache Lucene EuroCon 21 May 2010
54. TA
BE
DATA STORE
A directory of
useful data curated
by Guardian
editors
Apache Lucene EuroCon 21 May 2010
55. TA
BE
POLITICS API
Open database of
candidates, voting
records, constituencies,
election results, live
data on election day
Apache Lucene EuroCon 21 May 2010
56. TA
BE
POLITICS API
Open database of
candidates, voting
records, constituencies,
election results, live
data on election day
Apache Lucene EuroCon 21 May 2010 39
57. TA
BE
Open for Business
Apache Lucene EuroCon 21 May 2010 40
59. 1 3 Tiers of access, 3 Revenue models
BESPOKE: Take, reformat, augment our content. Same access as
Guardian. Revenue model to be negotiated. Combination of Media,
Fees, Downloads.
APPROVED: Take our full article content, with an advert.
Guardian keeps ad revenue, you keep rest-of-page revenue
KEYLESS: Take our headlines. You keep associated revenues
Apache Lucene EuroCon 21 May 2010 41
61. What this means
OPEN OUT: Developers can now access our full content APIs
on demand with keys post-approved.
We are now positioning the platform as a place to do
business with us.
So, rapid scalability, reliability, performance, are now core
requirements
Apache Lucene EuroCon 21 May 2010 43
62. 2 Open In
CONTENT API DATA STORE POLITICS API
A service for selecting A directory of Open database of
and collecting content useful data curated candidates, voting
from the Guardian for by Guardian records,
re-use editors constituencies,
election results, live
data on election day
Apache Lucene EuroCon 21 May 2010
63. 2 Open In
CONTENT API DATA STORE POLITICS API MICROAPPS
A service for selecting A directory of Open database of A framework for
and collecting content useful data curated candidates, voting integrating 3rd party
from the Guardian for by Guardian records, applications into
re-use editors constituencies, guardian.co.uk.
election results, live
data on election day
Apache Lucene EuroCon 21 May 2010
64. OPEN OUT
OPEN IN
Allow partners to build
Bring in data and apps
applications using
from the Internet
Guardian content and
services for other digital
platforms
Apache Lucene EuroCon 21 May 2010 45
68. What this means
Open In: Partners can now more easily integrate
into our core
The Open Platform will become key to our
commercial future.
Apache Lucene EuroCon 21 May 2010 49
69. Evolving the
architecture
Apache Lucene EuroCon 21 May 2010 50
70. From Publisher to Platform
★Seeking massive growth, but no longer only
broadcasting content
★User/partner engagement & contribution on
★journalism
★data
★software
★applications
★revenue and ads
★ Support developers and partners with data and
APIs, need scalability, reliability, speed
Apache Lucene EuroCon 21 May 2010 51
71. Web server Web server Web server
App server App server App server
Memcached
Oracle
CMS
Apache Lucene EuroCon 21 May 2010
72. Web server Web server Web server
Why RDBMS?
App server App server App server
5 years ago, fewer alternatives
Understand operations procedures
Memcached
Can easily recruit DBAs / devs
Developer/ops tools
Oracle
Business critical system: a safe
choice
CMS Data feeds
Apache Lucene EuroCon 21 May 2010
75. 30,000,000
Unique Users
26,250,000
22,500,000
Unique Users
18,750,000
15,000,000
11,250,000
7,500,000
3,750,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Apache Lucene EuroCon 21 May 2010 55
77. 28,000,000
25,750,000 Unique Users
23,500,000
21,250,000
19,000,000
16,750,000
14,500,000
12,250,000
May 2008 Jul 2008 Sep 2008 Nov 2008 Jan 2009
Apache Lucene EuroCon 21 May 2010 56
78. Whatʼs going on?
★We tag our content
(multifaceted)
★Guardian.co.uk is a faceted
browse through our tag-
space, with editorial teams
“spotlighting” key resources
on selected nodes.
★Can apply multiple facets in
queries faster in a search-like
architecture, than an RDBMS
Apache Lucene EuroCon 21 May 2010 57
79. Whatʼs going on?
★We tag our content
(multifaceted)
★Guardian.co.uk is a faceted
browse through our tag-
space, with editorial teams
“spotlighting” key resources
on selected nodes.
★Can apply multiple facets in
queries faster in a search-like
architecture, than an RDBMS
Apache Lucene EuroCon 21 May 2010 57
80. Whatʼs going on?
★We tag our content
(multifaceted)
★Guardian.co.uk is a faceted
browse through our tag-
space, with editorial teams
“spotlighting” key resources
on selected nodes.
★Can apply multiple facets in
queries faster in a search-like
architecture, than an RDBMS
Apache Lucene EuroCon 21 May 2010 57
83. Your App Here!
CONTENT API
A service for selecting REST API
and collecting content
from the Guardian for
re-use
Search engine
CMS
Guardian
database
Apache Lucene EuroCon 21 May 2010
85. We used Solr/Lucene
Can perform complex queries, including full text search
We can change the schema with no downtime.
On our dataset most queries are of a similar cost
Scales very well horizontally
Replication makes it easy to work in the cloud
Apache Lucene EuroCon 21 May 2010 62
86. Core
Web servers
App server
Memcached
rdbms
CMS
Apache Lucene EuroCon 21 May 2010 63
87. Core
Content API
Web servers
Solr
App server
Solr
Memcached
Solr
rdbms Solr
Solr
Solr
CMS
Cloud, EC2
Apache Lucene EuroCon 21 May 2010 63
88. Open in?
Simple REST/ HTTP framework
MICROAPPS allows lightweight development
A framework for Applications proxied for
integrating 3rd party performance
applications into
guardian.co.uk. Apps generally hosted in the
cloud, hot deployment into
production
Apache Lucene EuroCon 21 May 2010
89. Open in?
Simple REST/ HTTP framework
MICROAPPS allows lightweight development
A framework for Applications proxied for
integrating 3rd party performance
applications into
guardian.co.uk. Apps generally hosted in the
cloud, hot deployment into
production
Apache Lucene EuroCon 21 May 2010
90. Core
Apps
Web servers
Proxy
App
App server
App
Memcached
App
App rdbms
App
App CMS
external hosting
app engine etc
Apache Lucene EuroCon 21 May 2010 65
91. OPEN IN OPEN OUT
Web servers
Solr
Proxy
App App servers
App Memcached Solr
App Solr
App CMS Solr
Solr
App
Solr
App rdbms
Cloud, EC2
external hosting
app engine etc
Apache Lucene EuroCon 21 May 2010
92. C
I O
CONTENT
r
external Clo
C
I O
???????
r
external Clo
Apache Lucene EuroCon 21 May 2010