SlideShare a Scribd company logo
1 of 51
Solr in the Wild:
The Guardian’s
Open Platform
 Content API
    Graham Tackley
    guardian.co.uk
                      1
Guardian journalism online: 1995
Guardian journalism online: 1999
Guardian journalism online: 2000
Guardian journalism online: 2010
• Content API
      • MicroApp Framework
      • Politics API
      • Data Store
http://www.guardian.co.uk/open-platform
• Content API
      • MicroApp Framework
      • Politics API
      • Data Store
http://www.guardian.co.uk/open-platform
• Content API
      •                               pis.com
        MicroApp Framework
                              ard iana
        Politics API ten
      •ttp://con         t.gu
      h Data Store
      •
http://www.guardian.co.uk/open-platform
http://content.guardianapis.com
http://content.guardianapis.com/search.json?q=prague%20beer&order-
by=relevance&show-fields=all&show-tags=all
http://content.guardianapis.com/search.json?q=prague%20beer&order-
by=relevance&show-fields=all&show-tags=all&api-key=eurocon2010
http://content.guardianapis.com/search.json?q=prague
%20beer&order-by=relevance&show-refinements=all
Implementation

• Traffic patterns much less predictable than
  a web site
• Need to easily scale on demand...
• ... and never take down guardian.co.uk due
  to API traffic
Core

  Web servers


   App server


Memcached (20Gb)


      rdbms




      CMS
Core

  Web servers


   App server


Memcached (20Gb)


      rdbms        Content API



      CMS
Core

  Web servers


   App server


Memcached (20Gb)


      rdbms        Content API



      CMS
Why Solr?
• Database could not cope...
• ... and far too expensive to scale
• Solr ...
• ... was easy for developers to understand
• ... has a great replication model
• ... is simple to install
Core

  Web servers


   App server


Memcached (20Gb)




      CMS
Core

  Web servers


   App server


Memcached (20Gb)


                Solr Master


                   Indexer
      CMS
Core
                                               Api
  Web servers
                                             Solr & Api
   App server
                                             Solr & Api
Memcached (20Gb)




                              Replication
                                             Solr & Api




                                 Solr
                Solr Master
                                             Solr & Api


                   Indexer                   Solr & Api

      CMS
                                            Cloud, EC2
n
otl y
Solr Schema


• 350+ tables in database schema
Content fields are just fields...
Tags
Tags




Factbox
Tags




        Factbox

Media
Keywor                         Article
  d

Contributor                     Video


 Series       Tags   Content   Audio


Publication                    Gallery



  Tone                         Cartoon
... tags ...
record-type: content
id: world/picture/2010/may/14/formula-one-monaco
tag-ids: [ world/series/eyewitness, sport/formulaone, world/monaco ...]
tag-external-names: [ Eyewitness, Formula One, Monaco, ...]
... tags ...
     record-type: content
     id: world/picture/2010/may/14/formula-one-monaco
     tag-ids: [ world/series/eyewitness, sport/formulaone, world/monaco ...]
     tag-external-names: [ Eyewitness, Formula One, Monaco, ...]




record-type: tag
id: world/series/eyewitness
section-name: World news
web-title: Eyewitness
type: series
internal-name: Eyewitness (centespread
 photo series)
... tags ...
     record-type: content
     id: world/picture/2010/may/14/formula-one-monaco
     tag-ids: [ world/series/eyewitness, sport/formulaone, world/monaco ...]
     tag-external-names: [ Eyewitness, Formula One, Monaco, ...]




record-type: tag
id: world/series/eyewitness
section-name: World news
web-title: Eyewitness                       Included in search
type: series
internal-name: Eyewitness (centespread         stored=false
 photo series)
... factboxes ...
... factboxes ...




record-type: content
id: world/picture/2010/may/14/formula-one-monaco
factbox-data: [ 197544~|~~|~photography-tip~|~ ]
fact-data: [ 197544~|~pro-tip~|~The photographer has framed the cars between the
boats and spectators and played with the scales of the components of the scene ]
fact-value: [ The photographer has framed the cars between the boats and spectators
and played with the scales of the components of the scene ]
... media ...
... media
record-type: content
id: world/picture/2010/may/14/formula-one-monaco
media-asset-ids: [ PICTURE|362634152|IMAGE|362629791, ...]
... media
record-type: content
id: world/picture/2010/may/14/formula-one-monaco
media-asset-ids: [ PICTURE|362634152|IMAGE|362629791, ...]




record-type: media
id: PICTURE|362634152|IMAGE|362629791
credit: Mark Thompson/Getty Images
width: 1024
height: 768
path: /sys-images/Guardian/About/General/2010/5/14/1273823813621/66-lap-
Monaco-grand-prix-002.jpg
The Code

• Written in Scala
• Uses SolrJ
• Plan to open source in the new few months
The Code
The Code
Creating the Index

• Existing search index takes 20 hours to
  build
• Solr index takes 1 hour
• Here’s how...
1.1 million+ items of content in the database
1.1 million+ items of content in the database




                Split into Batches
SELECT id FROM (
  SELECT id, ROWNUM rownumber FROM
  content_live ORDER BY id )
WHERE MOD(rownumber, 10000) = 0
1.1 million+ items of content in the database




                Split into Batches
SELECT id FROM (
  SELECT id, ROWNUM rownumber FROM
  content_live ORDER BY id )
WHERE MOD(rownumber, 10000) = 0
1.1 million+ items of content in the database




                                                      Actor 1



                                                      Actor 2



Each actor:                                           Actor 3
1. reads data from database
2. builds solr input document
                                                      Actor 4
3. submits to solr
1.1 million+ items of content in the database




                                                      Actor 1



                                                      Actor 2



Each actor:                                           Actor 3
1. reads data from database
2. builds solr input document
                                                      Actor 4
3. submits to solr
Summary

• Solr made free access to our content API
  possible
• Replication rocks for scaling
• Solr just works for us (thank you!)
• NoSQL really isn’t that scary
• http://guardian.co.uk/open-platform
   • http://content.guardianapis.com
graham.tackley@guardian.co.uk · @tackers
                                           37

More Related Content

What's hot

Rails On Spring
Rails On SpringRails On Spring
Rails On Springswamy g
 
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...Amazon Web Services Korea
 
Introduction to Ruby on Rails
Introduction to Ruby on RailsIntroduction to Ruby on Rails
Introduction to Ruby on RailsManoj Kumar
 
Advanced Container Management and Scheduling
Advanced Container Management and SchedulingAdvanced Container Management and Scheduling
Advanced Container Management and SchedulingAmazon Web Services
 
Building Global Serverless Backends
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless BackendsAmazon Web Services
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
 
Apache Jackrabbit Oak on MongoDB
Apache Jackrabbit Oak on MongoDBApache Jackrabbit Oak on MongoDB
Apache Jackrabbit Oak on MongoDBMongoDB
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swaggerTony Tam
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMNorberto Leite
 
Web Clients for Ruby and What they should be in the future
Web Clients for Ruby and What they should be in the futureWeb Clients for Ruby and What they should be in the future
Web Clients for Ruby and What they should be in the futureToru Kawamura
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Railsjduff
 
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...Amazon Web Services Korea
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵Amazon Web Services Korea
 
AEM WITH MONGODB
AEM WITH MONGODBAEM WITH MONGODB
AEM WITH MONGODBNate Nelson
 
Padrino - the Godfather of Sinatra
Padrino - the Godfather of SinatraPadrino - the Godfather of Sinatra
Padrino - the Godfather of SinatraStoyan Zhekov
 
Melbourne User Group OAK and MongoDB
Melbourne User Group OAK and MongoDBMelbourne User Group OAK and MongoDB
Melbourne User Group OAK and MongoDBYuval Ararat
 

What's hot (20)

Rails On Spring
Rails On SpringRails On Spring
Rails On Spring
 
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
[AWS Dev Day] 이머징 테크 | Libra 소스코드분석 및 AWS에서 블록체인 기반 지불 시스템 최적화 방법 - 박혜영 AWS 솔...
 
Introduction to Ruby on Rails
Introduction to Ruby on RailsIntroduction to Ruby on Rails
Introduction to Ruby on Rails
 
Advanced Container Management and Scheduling
Advanced Container Management and SchedulingAdvanced Container Management and Scheduling
Advanced Container Management and Scheduling
 
Building Global Serverless Backends
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless Backends
 
Deep Dive into AWS Fargate
Deep Dive into AWS FargateDeep Dive into AWS Fargate
Deep Dive into AWS Fargate
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Containers State of the Union
Containers State of the UnionContainers State of the Union
Containers State of the Union
 
Apache Jackrabbit Oak on MongoDB
Apache Jackrabbit Oak on MongoDBApache Jackrabbit Oak on MongoDB
Apache Jackrabbit Oak on MongoDB
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
Effectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEMEffectively Deploying MongoDB on AEM
Effectively Deploying MongoDB on AEM
 
Web Clients for Ruby and What they should be in the future
Web Clients for Ruby and What they should be in the futureWeb Clients for Ruby and What they should be in the future
Web Clients for Ruby and What they should be in the future
 
Amazon ECS Deep Dive
Amazon ECS Deep DiveAmazon ECS Deep Dive
Amazon ECS Deep Dive
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
 
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
Airbnb가 직접 들려주는 Kubernetes 환경 구축 이야기 - Melanie Cebula 소프트웨어 엔지니어, Airbnb :: A...
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Amazon EC2 Container Service
Amazon EC2 Container ServiceAmazon EC2 Container Service
Amazon EC2 Container Service
 
AEM WITH MONGODB
AEM WITH MONGODBAEM WITH MONGODB
AEM WITH MONGODB
 
Padrino - the Godfather of Sinatra
Padrino - the Godfather of SinatraPadrino - the Godfather of Sinatra
Padrino - the Godfather of Sinatra
 
Melbourne User Group OAK and MongoDB
Melbourne User Group OAK and MongoDBMelbourne User Group OAK and MongoDB
Melbourne User Group OAK and MongoDB
 

Similar to The Guardian Open Platform Content API: Implementation

Kandroid for nhn_deview_20131013_v5_final
Kandroid for nhn_deview_20131013_v5_finalKandroid for nhn_deview_20131013_v5_final
Kandroid for nhn_deview_20131013_v5_finalNAVER D2
 
Infrastructure Automation with Chef
Infrastructure Automation with ChefInfrastructure Automation with Chef
Infrastructure Automation with ChefAdam Jacob
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
Broadcast Music Inc - Release Automation Rockstars!
Broadcast Music Inc - Release Automation Rockstars!Broadcast Music Inc - Release Automation Rockstars!
Broadcast Music Inc - Release Automation Rockstars!ghodgkinson
 
Red Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRobert Bohne
 
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...CODE BLUE
 
Design & Deploy a data-driven Web API in 2 hours
Design & Deploy a data-driven Web API in 2 hoursDesign & Deploy a data-driven Web API in 2 hours
Design & Deploy a data-driven Web API in 2 hoursRestlet
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!Jaran Flaath
 
DockerFinder: Multi-attribute search of Docker images
DockerFinder: Multi-attribute search of Docker imagesDockerFinder: Multi-attribute search of Docker images
DockerFinder: Multi-attribute search of Docker imagesDavide Neri
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataOpenSource Connections
 
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)contest-theta360
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)Igor Talevski
 
A Framework Driven Development
A Framework Driven DevelopmentA Framework Driven Development
A Framework Driven Development정민 안
 
Jornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS EnglishJornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS Englishsabueso81
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceAntonio García-Domínguez
 

Similar to The Guardian Open Platform Content API: Implementation (20)

Kandroid for nhn_deview_20131013_v5_final
Kandroid for nhn_deview_20131013_v5_finalKandroid for nhn_deview_20131013_v5_final
Kandroid for nhn_deview_20131013_v5_final
 
Infrastructure Automation with Chef
Infrastructure Automation with ChefInfrastructure Automation with Chef
Infrastructure Automation with Chef
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Broadcast Music Inc - Release Automation Rockstars!
Broadcast Music Inc - Release Automation Rockstars!Broadcast Music Inc - Release Automation Rockstars!
Broadcast Music Inc - Release Automation Rockstars!
 
Red Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABCRed Hat OpenShift Operators - Operators ABC
Red Hat OpenShift Operators - Operators ABC
 
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
Abusing Adobe Reader’s JavaScript APIs by Abdul-Aziz Hariri & Brian Gorenc - ...
 
Design & Deploy a data-driven Web API in 2 hours
Design & Deploy a data-driven Web API in 2 hoursDesign & Deploy a data-driven Web API in 2 hours
Design & Deploy a data-driven Web API in 2 hours
 
Apache Solr - search for everyone!
Apache Solr - search for everyone!Apache Solr - search for everyone!
Apache Solr - search for everyone!
 
DockerFinder: Multi-attribute search of Docker images
DockerFinder: Multi-attribute search of Docker imagesDockerFinder: Multi-attribute search of Docker images
DockerFinder: Multi-attribute search of Docker images
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
 
Logging & Docker - Season 2
Logging & Docker - Season 2Logging & Docker - Season 2
Logging & Docker - Season 2
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)AngularJS 1.x - your first application (problems and solutions)
AngularJS 1.x - your first application (problems and solutions)
 
A Framework Driven Development
A Framework Driven DevelopmentA Framework Driven Development
A Framework Driven Development
 
Jornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS EnglishJornadas gvSIG 2009 WSS English
Jornadas gvSIG 2009 WSS English
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

The Guardian Open Platform Content API: Implementation

Editor's Notes

  1. As Stephen said: Very basic links to interesting content
  2. Note the registration paywall
  3. Broadcast, stories, basic community Rebuild started in 2005
  4. “Web 2.0”, community, (full fat) RSS, discoverability, tagging. Where do we go from here? Other newspaper sites - looking to restrict access to content via paywalls etc - we’re looking to open up
  5. We’ve spent the last 12 months experimenting around open distribution and open partnerships - 4 initiatives make up the open platform (right now) (As stephen said)
  6. This talk focuses on the content API - provides a way for others to re-present our content in their applications
  7. http://content.guardianapis.com
  8. http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance (most users want most recent content, so default ordering is newest) This is just a dismax search
  9. Can also retrieve extra metadata, including tags http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance&show-fields=all&show-tags=all
  10. If you have an API key can get full content. (You need to apply for this and agree to some T&Cs - mostly to ensure that we can take down content for legal reasons.) This example key is only valid for this conference, will be disabled afterwards :) http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance&show-fields=all&show-tags=all&api-key=eurocon2010
  11. Refinements give the ability to narrow down your result set (ofc these are just solr facets) http://content.guardianapis.com/search.json?q=prague%20beer&order-by=relevance&show-refinements=all
  12. Our current architecture - perhaps we could feed the content api off the database?
  13. Our current architecture - perhaps we could feed the content api off the database?
  14. time to developer understanding: about 2 hours
  15. currently rebuild every night, incrementals during the day [next] expose solr master to EC2, create hosts in EC2 that replicate using solr replication - works fantastically. 6GB index size. Load-balancer config. We use solr.war from 1.4 dist totally unchanged - run api webapp in same jetty container
  16. currently rebuild every night, incrementals during the day [next] expose solr master to EC2, create hosts in EC2 that replicate using solr replication - works fantastically. 6GB index size. Load-balancer config. We use solr.war from 1.4 dist totally unchanged - run api webapp in same jetty container
  17. Lots of talk nowadays on “no sql” solutions
  18. No. Designed a new logo that better reflects where we currently are
  19. disclaimer: the next slides describe how *we* did it; not necessarily best practice! We took the opportunity to simplify our domain model....
  20. Content fields are just fields But also need to map tags, media, and factboxes
  21. Here’s how we model tags & content
  22. Fact boxes associate arbitary information with content We need to search them, but 1-to-1 relationship with content So no separate record
  23. Fact boxes associate arbitary information with content We need to search them, but 1-to-1 relationship with content So no separate record
  24. show-media allows access to the non-text assets of an item of content
  25. Code mostly just takes input params, converts to solr query, and transforms result to json or xml I’m not here to talk about scala, but here’s a quick couple of snippets
  26. RichSolrDocument makes SolrDocument more “scala” ish
  27. Scala can make writing understandable code much easier
  28. Supporting auto scaling in EC2 - our base images all have empty index (EC2 load balance is configured to check this url & add server to list on 200 response)
  29. Thanks to Grant Ingersoll from Lucid Imagination for guiding us down this route (were planning to do something much more complicated), Also thanks to Francis Rhys-Jones to actually implementing this This is game changing - suddenly we’re prepared to change the index -- and NoSQL solutions seem a whole lot less scary: we migrate our entire database every night!
  30. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  31. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  32. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  33. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  34. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  35. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  36. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  37. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  38. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  39. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  40. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  41. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  42. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  43. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  44. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  45. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  46. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  47. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  48. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  49. Effectively the batch divisions become a work queue fed to a set of actors (Actually, we found that 8 worked best with our hardware) Each actor reads the data from the database; creates a solrinputdocument; submits
  50. All we wanted was a search engine... but actually we got an easy to work with, fast, scalable NoSQL solution!
  51. All we wanted was a search engine... but actually we got an easy to work with, fast, scalable NoSQL solution!