These slides were from my Hippo GetTogether 2013 presentation. During this presentation I went into detail about the architecture behind our high performance relevance platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.
Hippo GetTogether: The architecture behind Hippos relevance platform
1. Building a relevance platform
with Couchbase and
Elasticsearch
Hippo GetTogether, 21 June 2013
Jeroen Reijn | @jreijn | #hgt2013
Hippo GetTogether 2013
follow the Hippo trail
2. follow the Hippo trail
Hippo GetTogether 2013
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
4. follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
“The capability of a search
engine or function to
retrieve data appropriate
to a user's needs.”
http://www.thefreedictionary.com/relevance
6. follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we deliver
relevant content
@Hippo
7. follow the Hippo trail
Hippo GetTogether 2013
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
8. follow the Hippo trail
Hippo GetTogether 2013
Matching
Characteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of
weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from
Amsterdam"
Persona - one or more target groups that describe a
certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
9. follow the Hippo trail
Hippo GetTogether 2013
What do we store?
Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
10. follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
BIG DATA !!
11. follow the Hippo trail
Hippo GetTogether 2013
Real-time analysis
24. follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
NoSQL to the rescue
25. follow the Hippo trail
Hippo GetTogether 2013
Suitable types
• Key-value store
• Document database
26. follow the Hippo trail
Hippo GetTogether 2013
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
42. follow the Hippo trail
Hippo GetTogether 2013
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
43. follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we run
Couchbase @Hippo
44. follow the Hippo trail
Hippo GetTogether 2013
Load Balancer
Database cluster
Hippo Delivery Tier
Couchbase cluster
•Request log data
•Targeting data
•Statistics data
45. follow the Hippo trail
Hippo GetTogether 2013
Query capabilities
• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
46. follow the Hippo trail
Hippo GetTogether 2013
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache license
• RESTful API
47. follow the Hippo trail
Hippo GetTogether 2013
Added value of ES
• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
48. follow the Hippo trail
Hippo GetTogether 2013
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Write
Read
XDCR Couchbase ES
Transport plugin
Replicating to ES