SlideShare a Scribd company logo
1 of 53
Download to read offline
Building a relevance platform
with Couchbase and
Elasticsearch
Hippo GetTogether, 21 June 2013
Jeroen Reijn | @jreijn | #hgt2013
Hippo GetTogether 2013
follow the Hippo trail
follow the Hippo trail
Hippo GetTogether 2013
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Relevance?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
“The capability of a search
engine or function to
retrieve data appropriate
to a user's needs.”
http://www.thefreedictionary.com/relevance
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we deliver
relevant content
@Hippo
follow the Hippo trail
Hippo GetTogether 2013
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
Hippo GetTogether 2013
Matching
Characteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of
weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from
Amsterdam"
Persona - one or more target groups that describe a
certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
follow the Hippo trail
Hippo GetTogether 2013
What do we store?
Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
BIG DATA !!
follow the Hippo trail
Hippo GetTogether 2013
Real-time analysis
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Architecture
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Targeting Data Collection
Compose output
Request
Response
Fetch content
Scoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Scaling
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
Targeting
Datastore
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What kind of ‘storage’?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Question?
follow the Hippo trail
Hippo GetTogether 2013
Distributed Cache?
follow the Hippo trail
Hippo GetTogether 2013
We have a winner!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Requirements
change!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
NoSQL to the rescue
follow the Hippo trail
Hippo GetTogether 2013
Suitable types
• Key-value store
• Document database
follow the Hippo trail
Hippo GetTogether 2013
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
Hippo GetTogether 2013
Selection Criteria
• Performance
• Scalability
• Schema flexibility
• Simplicity
• Monitoring
• Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Performance !!
Performance !!!!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Scalability
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Schema flexibility
follow the Hippo trail
Hippo GetTogether 2013
{
"visitorId": "7a1c7e75-8539-40",
"pageUrl": "http://localhost:8080/site/news",
"pathInfo": "/news",
"remoteAddr": "127.0.0.1",
"referer": "http://localhost:8080/site/",
"timestamp": 1371419505909,
"collectorData": {
"geo": {
"country": "",
"city": "",
"latitude": 0,
"longitude": 0
},
"returningvisitor": false,
"channel": "English Website"
},
"personaIdScores": [],
"globalPersonaIdScores": []
}
Request log document
follow the Hippo trail
Hippo GetTogether 2013
{
"geo": {
"collectorId": "geo",
"city": "",
"country": "",
"latitude": 0,
"longitude": 0
},
"channel": {
"collectorId": "channel",
"channels": [
"English Website"
],
"lastVisitedChannel": "English Website"
}
}
Visitor document
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Simplicity
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Monitoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Couchbase
follow the Hippo trail
Hippo GetTogether 2013
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easy scalability
• Schema flexibility
• Low latency
follow the Hippo trail
Hippo GetTogether 2013
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
• Apache license
follow the Hippo trail
Hippo GetTogether 2013
Performance
• Object managed cache
• Write Queue to disk
• Avoids Cold Cache
follow the Hippo trail
Hippo GetTogether 2013
Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase
Copyright © Altoros Systems, Inc.
follow the Hippo trail
Hippo GetTogether 2013
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
Hippo GetTogether 2013
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we run
Couchbase @Hippo
follow the Hippo trail
Hippo GetTogether 2013
Load Balancer
Database cluster
Hippo Delivery Tier
Couchbase cluster
•Request log data
•Targeting data
•Statistics data
follow the Hippo trail
Hippo GetTogether 2013
Query capabilities
• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
follow the Hippo trail
Hippo GetTogether 2013
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache license
• RESTful API
follow the Hippo trail
Hippo GetTogether 2013
Added value of ES
• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
follow the Hippo trail
Hippo GetTogether 2013
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Write
Read
XDCR Couchbase ES
Transport plugin
Replicating to ES
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What’s Next?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
What’s Next?
follow the Hippo trail
Hippo GetTogether 2013
Advanced analytics
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Demo time!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Thank you!
Questions?
j.reijn@onehippo.com | @jreijn
ps. We’re hiring!

More Related Content

What's hot

Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsThe (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsRIPE NCC
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinarscorlosquet
 
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...Charlie Hull
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the carsJuantomás García Molina
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyJason Plurad
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Cloudsinovex GmbH
 
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebKafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebHostedbyConfluent
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareCharlie Hull
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardDemai Ni
 
APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6Siena Perry
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache sparkInfoFarm
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ SpotifyKinshuk Mishra
 
Why contribute to open source projects
Why contribute to open source projectsWhy contribute to open source projects
Why contribute to open source projectsKranti Parisa
 
What's the story with Open Source?
What's the story with Open Source? What's the story with Open Source?
What's the story with Open Source? Charlie Hull
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at AirbnbHao Wang
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupalscorlosquet
 

What's hot (20)

Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and ToolsThe (IPv6) Internet in Romania - RIPE NCC Data and Tools
The (IPv6) Internet in Romania - RIPE NCC Data and Tools
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinar
 
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the cars
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
ElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der CloudsElasticSearch - Suche im Zeitalter der Clouds
ElasticSearch - Suche im Zeitalter der Clouds
 
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open WebKafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
Kafka and GraphQL: Misconceptions and Connections | Gerard Klijs, Open Web
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search software
 
Janus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforwardJanus graph lookingbackwardreachingforward
Janus graph lookingbackwardreachingforward
 
APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6APNIC Hackathon The Lord of IPv6
APNIC Hackathon The Lord of IPv6
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache spark
 
Real time ads personalization @ Spotify
Real time ads personalization @ SpotifyReal time ads personalization @ Spotify
Real time ads personalization @ Spotify
 
Why contribute to open source projects
Why contribute to open source projectsWhy contribute to open source projects
Why contribute to open source projects
 
What's the story with Open Source?
What's the story with Open Source? What's the story with Open Source?
What's the story with Open Source?
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
The Future of Search and SEO in Drupal
The Future of Search and SEO in DrupalThe Future of Search and SEO in Drupal
The Future of Search and SEO in Drupal
 

Viewers also liked

Real-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchReal-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchJeroen Reijn
 
Basic web application development with Apache Cocoon 2.1
Basic web application development with  Apache Cocoon 2.1Basic web application development with  Apache Cocoon 2.1
Basic web application development with Apache Cocoon 2.1Jeroen Reijn
 
Continuous Delivery in a content centric world
Continuous Delivery in a content centric worldContinuous Delivery in a content centric world
Continuous Delivery in a content centric worldJeroen Reijn
 
Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Hippo
 
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesAccount based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesThe Marketing Practice
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVMJeroen Reijn
 

Viewers also liked (6)

Real-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and ElastichsearchReal-time visitor analysis with Couchbase and Elastichsearch
Real-time visitor analysis with Couchbase and Elastichsearch
 
Basic web application development with Apache Cocoon 2.1
Basic web application development with  Apache Cocoon 2.1Basic web application development with  Apache Cocoon 2.1
Basic web application development with Apache Cocoon 2.1
 
Continuous Delivery in a content centric world
Continuous Delivery in a content centric worldContinuous Delivery in a content centric world
Continuous Delivery in a content centric world
 
Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2Introducing Hippo CMS 10.2
Introducing Hippo CMS 10.2
 
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmesAccount based marketing - targeting key accounts with 1-2-1 marketing programmes
Account based marketing - targeting key accounts with 1-2-1 marketing programmes
 
Shootout! Template engines for the JVM
Shootout! Template engines for the JVMShootout! Template engines for the JVM
Shootout! Template engines for the JVM
 

Similar to Hippo GetTogether: The architecture behind Hippos relevance platform

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Generative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfGenerative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfKunihiroSugiyama1
 
How Search Works
How Search WorksHow Search Works
How Search WorksAhrefs
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforceagarciaodeian
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...MongoDB
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for youTatiana Al-Chueyr
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceSalesforce Developers
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceCarolEnLaNube
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Toolsm_richardson
 
PageSpeed and SPDY
PageSpeed and SPDYPageSpeed and SPDY
PageSpeed and SPDYBlake Crosby
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Maura Teal
 
Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.OW2
 
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Linked Data and Search:  Thomas Steiner (Google Inc, Germany)Linked Data and Search:  Thomas Steiner (Google Inc, Germany)
Linked Data and Search: Thomas Steiner (Google Inc, Germany)FIA2010
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at ExperienceMatt Mills
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesJohan Abildskov
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...State of Search Conference
 

Similar to Hippo GetTogether: The architecture behind Hippos relevance platform (20)

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Generative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdfGenerative AI Study Group_2ndSesssion_20230620.pdf
Generative AI Study Group_2ndSesssion_20230620.pdf
 
How Search Works
How Search WorksHow Search Works
How Search Works
 
Data Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets SalesforceData Pipelines - Big Data meets Salesforce
Data Pipelines - Big Data meets Salesforce
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Data Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets SalesforceData Pipelines: Big Data Meets Salesforce
Data Pipelines: Big Data Meets Salesforce
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets Salesforce
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 
PageSpeed and SPDY
PageSpeed and SPDYPageSpeed and SPDY
PageSpeed and SPDY
 
Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018Balancing Act of Caching LoopConf 2018
Balancing Act of Caching LoopConf 2018
 
Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.Altic's big analytics stack, Charly Clairmont, Altic.
Altic's big analytics stack, Charly Clairmont, Altic.
 
Dave's Wellcome Library digitisation presentation
Dave's Wellcome Library digitisation presentationDave's Wellcome Library digitisation presentation
Dave's Wellcome Library digitisation presentation
 
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Linked Data and Search:  Thomas Steiner (Google Inc, Germany)Linked Data and Search:  Thomas Steiner (Google Inc, Germany)
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
 
Productionizing Data Science at Experience
Productionizing Data Science at ExperienceProductionizing Data Science at Experience
Productionizing Data Science at Experience
 
The what, how and why of scaling git repositories
The what, how and why of scaling git repositoriesThe what, how and why of scaling git repositories
The what, how and why of scaling git repositories
 
Apache PIG
Apache PIGApache PIG
Apache PIG
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
 

Recently uploaded

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Hippo GetTogether: The architecture behind Hippos relevance platform

  • 1. Building a relevance platform with Couchbase and Elasticsearch Hippo GetTogether, 21 June 2013 Jeroen Reijn | @jreijn | #hgt2013 Hippo GetTogether 2013 follow the Hippo trail
  • 2. follow the Hippo trail Hippo GetTogether 2013 About me • Architect @ Hippo • DevOps guy • Blogger @ http://blog.jeroenreijn.com
  • 3. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Relevance?
  • 4. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto “The capability of a search engine or function to retrieve data appropriate to a user's needs.” http://www.thefreedictionary.com/relevance
  • 5. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto
  • 6. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we deliver relevant content @Hippo
  • 7. follow the Hippo trail Hippo GetTogether 2013 Registration Visitor - entity making HTTP requests Collector - records data about a visitor or his behavior Example: location collector (GeoIPCollector) Targeting Data - all data about a specific visitor Example: IP address is located in Amsterdam
  • 8. follow the Hippo trail Hippo GetTogether 2013 Matching Characteristic - a type of fact about visitors Example: "comes from a city", "experiences a type of weather" Target Group - the specification of a Characteristic Example: "comes from a European city", "comes from Amsterdam" Persona - one or more target groups that describe a certain type of visitor Example: "Jim, the European urban consumer", "Alice, the Pet owner"
  • 9. follow the Hippo trail Hippo GetTogether 2013 What do we store? Request log Targeting data Statistics Averages, e.g. how many visitors became which persona
  • 10. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto BIG DATA !!
  • 11. follow the Hippo trail Hippo GetTogether 2013 Real-time analysis
  • 12. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Architecture
  • 13. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server XMLJSON (X)HTML
  • 14. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Fetch content Compose output Request Response
  • 15. follow the Hippo trail Hippo GetTogether 2013 Delivery Tier URL Matching Targeting Data Collection Compose output Request Response Fetch content Scoring
  • 16. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scaling
  • 17. follow the Hippo trail Hippo GetTogether 2013 RDBMS Hippo Delivery Tier Hippo Repository App server Hippo Delivery Tier Hippo Repository App server Scaling out
  • 18. follow the Hippo trail Hippo GetTogether 2013 RDBMS Delivery Tier Repository App server Delivery Tier Repository App server Scaling out Targeting Datastore
  • 19. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What kind of ‘storage’?
  • 20. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Question?
  • 21. follow the Hippo trail Hippo GetTogether 2013 Distributed Cache?
  • 22. follow the Hippo trail Hippo GetTogether 2013 We have a winner!
  • 23. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Requirements change!
  • 24. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto NoSQL to the rescue
  • 25. follow the Hippo trail Hippo GetTogether 2013 Suitable types • Key-value store • Document database
  • 26. follow the Hippo trail Hippo GetTogether 2013 Assessment Criteria Maturity Data model Consistency model PerformanceReplication Caching model Query model Monitoring Scalability Reliability Support
  • 27. follow the Hippo trail Hippo GetTogether 2013 Selection Criteria • Performance • Scalability • Schema flexibility • Simplicity • Monitoring • Support
  • 28. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Performance !! Performance !!!!
  • 29. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Scalability
  • 30. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Schema flexibility
  • 31. follow the Hippo trail Hippo GetTogether 2013 { "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": [] } Request log document
  • 32. follow the Hippo trail Hippo GetTogether 2013 { "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" } } Visitor document
  • 33. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Simplicity
  • 34. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Monitoring
  • 35. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Support
  • 36. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Couchbase
  • 37. follow the Hippo trail Hippo GetTogether 2013 Why Couchbase? • Drop-in replacement for memcached • Read/Write-through cache • High throughput • Easy scalability • Schema flexibility • Low latency
  • 38. follow the Hippo trail Hippo GetTogether 2013 Couchbase • Open Source • Document-oriented • Easy Scalable • Consistent High Performance • Apache license
  • 39. follow the Hippo trail Hippo GetTogether 2013 Performance • Object managed cache • Write Queue to disk • Avoids Cold Cache
  • 40. follow the Hippo trail Hippo GetTogether 2013 Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.
  • 41. follow the Hippo trail Hippo GetTogether 2013 Easy scalable • Auto sharding • Cross cluster replication (XDCR) • Master - Master replication
  • 42. follow the Hippo trail Hippo GetTogether 2013 Flexible data model • Native JSON support • Incremental Map Reduce • Gives power to the developer
  • 43. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto How we run Couchbase @Hippo
  • 44. follow the Hippo trail Hippo GetTogether 2013 Load Balancer Database cluster Hippo Delivery Tier Couchbase cluster •Request log data •Targeting data •Statistics data
  • 45. follow the Hippo trail Hippo GetTogether 2013 Query capabilities • Querying via views • Secondary indexes via views • Views based on Map - Reduce • Lacks some advanced query capabilities
  • 46. follow the Hippo trail Hippo GetTogether 2013 Elasticsearch • Apache Lucene • Designed to be distributed • Schema free • Apache license • RESTful API
  • 47. follow the Hippo trail Hippo GetTogether 2013 Added value of ES • Full text search • Faceted search • Geo spatial search • All in (near) real-time
  • 48. follow the Hippo trail Hippo GetTogether 2013 Couchbase Server Cluster Elasticsearch Server Cluster Hippo Delivery Tier Java API Write Read XDCR Couchbase ES Transport plugin Replicating to ES
  • 49. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?
  • 50. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto What’s Next?
  • 51. follow the Hippo trail Hippo GetTogether 2013 Advanced analytics
  • 52. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Demo time!
  • 53. follow the Hippo trail Hippo GetTogether 2013 OneHippo @ Goto Thank you! Questions? j.reijn@onehippo.com | @jreijn ps. We’re hiring!