SlideShare a Scribd company logo
1 of 18
IN-MEMORY + NOSQL + ACID 
RESTFUL RECOMMENDATION ENGINE 
USING SPRING, AEROSPIKE AND MONGODB 
PETER MILNE 
DIRECTOR OF APPLICATION ENGINEERING 
QCON SAN FRANCISCO 
NOVEMBER 2014 
Aerospike aer . o . spike [air-oh- spahyk] 
noun, 1. tip of a rocket that enhances speed and stability 
© 2014 Aerospike. All rights reserved. Confidential 1
Where is the wisdom we have lost in 
knowledge? Where is the knowledge we 
have lost in information? 
-T.S.Elliot 1934 
© 2014 Aerospike. All rights reserved. Confidential 2 
Sagely Advice 
Don’t let: 
■ Information oversaturate your Knowledge 
■Knowledge distract you from Wisdom
Recommendation Engines 
Recommendation engines are used in 
applications to personalize user 
experience. 
© 2014 Aerospike. All rights reserved. Confidential 3
Movie recommendation Engine 
© 2014 Aerospike. All rights reserved. Confidential 4 
■ Similar to 
■ Hulu, Netflix, YouTube, Daily Motion, etc 
■ Context free 
■ RESTful service 
■Spring Boot 
■ ~100,000 Movies 
■ ~800,000 recommendations 
■ Data store 
■Aerospike 
■MongoDB
© 2014 Aerospike. All rights reserved. Confidential 5 
RESTful web services 
■ HTTP based 
■GET 
■POST 
■PUT 
■DELETE 
■ Payload 
■XML 
■JSON 
Commonly called “APIs” 
■ “APIs are like … Everybody has 
one”
© 2014 Aerospike. All rights reserved. Confidential 6 
Spring Boot 
“Spring Boot makes it easy to create stand-alone, 
production-grade Spring based 
Applications that can you can just run.” 
-- spring.io
© 2014 Aerospike. All rights reserved. Confidential 7 
MongoDB 
■ NoSQL 
■ Schema less 
■ Data in JSON documents 
■BSON on server 
■ Seductive programmatic interface 
■Fashionable JSON 
■ In-memory a.k.a RAM 
■Fast
© 2014 Aerospike. All rights reserved. Confidential 8 
Aerospike 
1) No Hotspots 
– DHT simplifies data 
partitioning 
2) Smart Client – 1 hop to 
data, no load balancers 
3) Shared Nothing 
Architecture, 
every node is identical 4) Single row ACID 
– synch replication in 
cluster 
5) Smart Cluster, Zero Touch 
– auto-failover, 
rebalancing, rack aware, 
rolling upgrades 
6) Flash Optimized
© 2014 Aerospike. All rights reserved. Confidential 9 
Cosine Similarity 
“Cosine similarity is a measure of 
similarity between two vectors of 
an inner product space that measures 
the cosine of the angle between 
them. …. Cosine similarity is 
particularly used in positive space, 
where the outcome is neatly 
bounded in [0,1].” - Wikipedia
The recommendation algorithm 
1. Jane Doe accesses the application 
2. Retrieve Jane’s User Profile 
3. Retrieve the Movie record for each movie that Jane has watched. For 
each movie: 
■ Retrieve each of the watched user profiles 
■ See if this profile is similar to Jane’s by giving it a score 
4. Using the user profile with the highest similarity score, recommend the 
movies in this user profile that Jane has not seen. 
Ac  B = B  A 
A B 
Relative complement 
This is a very elementary 
technique and it is useful only as 
an illustration, and it does have 
several flaws 
© 2014 Aerospike. All rights reserved. Confidential 10
© 2014 Aerospike. All rights reserved. Confidential 11 
Data model
© 2014 Aerospike. All rights reserved. Confidential 12 
One to Many
© 2014 Aerospike. All rights reserved. Confidential 13 
The code 
RESTful 
List of Movies 
watched 
Make a vector
Finding a similar customer 
Cosine similarity 
© 2014 Aerospike. All rights reserved. Confidential 14 
Watched list 
For each movie 
For each 
customer
© 2014 Aerospike. All rights reserved. Confidential 15 
Conclusions 
■ MongoDB 
■Technically seductive programmer interface 
■Uses JSON ( the current fashion ) 
■Painful to scale – requires lots of servers 
■Needed a big cluster - Ran out of RAM 
■ Aerospike 
■Tiny cluster required 
■Easy to scale 
■Faster than Mongo 
■Large Data Type are different to JSON – but good
© 2014 Aerospike. All rights reserved. Confidential 16 
Books
Resources 
➤ GitHub – Recommendation engine example 
 https://github.com/aerospike/recommendation-engine-example.git 
© 2014 Aerospike. All rights reserved. Confidential 17 
➤ Spring Boot 
 http://projects.spring.io/spring-boot/ 
➤ Aerospike 
 http://www.aerospike.com/ 
➤ MongoDB 
 http://www.mongodb.org/
Questions? 
peter@aerospike.com 
helipilot50 
@helipilot50 
© 2014 Aerospike. All rights reserved. Confidential 18

More Related Content

Viewers also liked

Building an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDBBuilding an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDB
MongoDB
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer
MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
Technical Seminar PPT
Technical Seminar PPTTechnical Seminar PPT
Technical Seminar PPT
Kshitiz_Vj
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
MongoDB
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 

Viewers also liked (15)

Building an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDBBuilding an Online-Recommendation Engine with MongoDB
Building an Online-Recommendation Engine with MongoDB
 
Partner Webinar: Recommendation Engines with MongoDB and Hadoop
 Partner Webinar: Recommendation Engines with MongoDB and Hadoop Partner Webinar: Recommendation Engines with MongoDB and Hadoop
Partner Webinar: Recommendation Engines with MongoDB and Hadoop
 
Containers Aren't Just for the Cloud
Containers Aren't Just for the CloudContainers Aren't Just for the Cloud
Containers Aren't Just for the Cloud
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
 
01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar
 
Configuring Aerospike - Part 1
Configuring Aerospike - Part 1Configuring Aerospike - Part 1
Configuring Aerospike - Part 1
 
Single View of the Customer
Single View of the Customer Single View of the Customer
Single View of the Customer
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Introduction to Aerospike
Introduction to AerospikeIntroduction to Aerospike
Introduction to Aerospike
 
Build a Recommendation Engine using Amazon Machine Learning in Real-time
Build a Recommendation Engine using Amazon Machine Learning in Real-timeBuild a Recommendation Engine using Amazon Machine Learning in Real-time
Build a Recommendation Engine using Amazon Machine Learning in Real-time
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Technical Seminar PPT
Technical Seminar PPTTechnical Seminar PPT
Technical Seminar PPT
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 

Similar to Recommendation engine using Aerospike and/OR MongoDB

Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
mfrancis
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
Aerospike, Inc.
 

Similar to Recommendation engine using Aerospike and/OR MongoDB (20)

Brian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time biddingBrian Bulkowski : what startups can learn from real-time bidding
Brian Bulkowski : what startups can learn from real-time bidding
 
NoSQL in Real-time Architectures
NoSQL in Real-time ArchitecturesNoSQL in Real-time Architectures
NoSQL in Real-time Architectures
 
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACIDACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/HourRunning a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
 
Node.Js: Basics Concepts and Introduction
Node.Js: Basics Concepts and Introduction Node.Js: Basics Concepts and Introduction
Node.Js: Basics Concepts and Introduction
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 
Instantaneous Replication of Build Artifacts with NetApp
Instantaneous Replication of Build Artifacts with NetAppInstantaneous Replication of Build Artifacts with NetApp
Instantaneous Replication of Build Artifacts with NetApp
 
Copr HD OpenStack Day India
Copr HD OpenStack Day IndiaCopr HD OpenStack Day India
Copr HD OpenStack Day India
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
 
Generating NoSQL from SQL
Generating NoSQL from SQLGenerating NoSQL from SQL
Generating NoSQL from SQL
 
Monitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to backMonitor OpenStack Environments from the bottom up and front to back
Monitor OpenStack Environments from the bottom up and front to back
 
Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014
 
The Three Stages of Cloud Adoption - RightScale Compute 2013
The Three Stages of Cloud Adoption - RightScale Compute 2013The Three Stages of Cloud Adoption - RightScale Compute 2013
The Three Stages of Cloud Adoption - RightScale Compute 2013
 
CODE BLUE 2014 : Persisted: The active use and exploitation of Microsoft's Ap...
CODE BLUE 2014 : Persisted: The active use and exploitation of Microsoft's Ap...CODE BLUE 2014 : Persisted: The active use and exploitation of Microsoft's Ap...
CODE BLUE 2014 : Persisted: The active use and exploitation of Microsoft's Ap...
 
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Recommendation engine using Aerospike and/OR MongoDB

  • 1. IN-MEMORY + NOSQL + ACID RESTFUL RECOMMENDATION ENGINE USING SPRING, AEROSPIKE AND MONGODB PETER MILNE DIRECTOR OF APPLICATION ENGINEERING QCON SAN FRANCISCO NOVEMBER 2014 Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability © 2014 Aerospike. All rights reserved. Confidential 1
  • 2. Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -T.S.Elliot 1934 © 2014 Aerospike. All rights reserved. Confidential 2 Sagely Advice Don’t let: ■ Information oversaturate your Knowledge ■Knowledge distract you from Wisdom
  • 3. Recommendation Engines Recommendation engines are used in applications to personalize user experience. © 2014 Aerospike. All rights reserved. Confidential 3
  • 4. Movie recommendation Engine © 2014 Aerospike. All rights reserved. Confidential 4 ■ Similar to ■ Hulu, Netflix, YouTube, Daily Motion, etc ■ Context free ■ RESTful service ■Spring Boot ■ ~100,000 Movies ■ ~800,000 recommendations ■ Data store ■Aerospike ■MongoDB
  • 5. © 2014 Aerospike. All rights reserved. Confidential 5 RESTful web services ■ HTTP based ■GET ■POST ■PUT ■DELETE ■ Payload ■XML ■JSON Commonly called “APIs” ■ “APIs are like … Everybody has one”
  • 6. © 2014 Aerospike. All rights reserved. Confidential 6 Spring Boot “Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that can you can just run.” -- spring.io
  • 7. © 2014 Aerospike. All rights reserved. Confidential 7 MongoDB ■ NoSQL ■ Schema less ■ Data in JSON documents ■BSON on server ■ Seductive programmatic interface ■Fashionable JSON ■ In-memory a.k.a RAM ■Fast
  • 8. © 2014 Aerospike. All rights reserved. Confidential 8 Aerospike 1) No Hotspots – DHT simplifies data partitioning 2) Smart Client – 1 hop to data, no load balancers 3) Shared Nothing Architecture, every node is identical 4) Single row ACID – synch replication in cluster 5) Smart Cluster, Zero Touch – auto-failover, rebalancing, rack aware, rolling upgrades 6) Flash Optimized
  • 9. © 2014 Aerospike. All rights reserved. Confidential 9 Cosine Similarity “Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. …. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1].” - Wikipedia
  • 10. The recommendation algorithm 1. Jane Doe accesses the application 2. Retrieve Jane’s User Profile 3. Retrieve the Movie record for each movie that Jane has watched. For each movie: ■ Retrieve each of the watched user profiles ■ See if this profile is similar to Jane’s by giving it a score 4. Using the user profile with the highest similarity score, recommend the movies in this user profile that Jane has not seen. Ac  B = B A A B Relative complement This is a very elementary technique and it is useful only as an illustration, and it does have several flaws © 2014 Aerospike. All rights reserved. Confidential 10
  • 11. © 2014 Aerospike. All rights reserved. Confidential 11 Data model
  • 12. © 2014 Aerospike. All rights reserved. Confidential 12 One to Many
  • 13. © 2014 Aerospike. All rights reserved. Confidential 13 The code RESTful List of Movies watched Make a vector
  • 14. Finding a similar customer Cosine similarity © 2014 Aerospike. All rights reserved. Confidential 14 Watched list For each movie For each customer
  • 15. © 2014 Aerospike. All rights reserved. Confidential 15 Conclusions ■ MongoDB ■Technically seductive programmer interface ■Uses JSON ( the current fashion ) ■Painful to scale – requires lots of servers ■Needed a big cluster - Ran out of RAM ■ Aerospike ■Tiny cluster required ■Easy to scale ■Faster than Mongo ■Large Data Type are different to JSON – but good
  • 16. © 2014 Aerospike. All rights reserved. Confidential 16 Books
  • 17. Resources ➤ GitHub – Recommendation engine example  https://github.com/aerospike/recommendation-engine-example.git © 2014 Aerospike. All rights reserved. Confidential 17 ➤ Spring Boot  http://projects.spring.io/spring-boot/ ➤ Aerospike  http://www.aerospike.com/ ➤ MongoDB  http://www.mongodb.org/
  • 18. Questions? peter@aerospike.com helipilot50 @helipilot50 © 2014 Aerospike. All rights reserved. Confidential 18

Editor's Notes

  1. For example, e-commerce applications recommend products to a customer that other customers with similar behavior have viewed, enjoyed, or purchased. News applications would use a real-time recommendation engine to provide readers with fresh content, as stories come and go quickly. These application additions improve the user experience, increase sales, and help retain loyal customers.
  2. The Big Picture Aerospike is a distributed, scalable NoSQL database. It is architected with three key objectives: To create a flexible, scalable platform that would meet the needs of today’s web-scale applications To provide the robustness and reliability (ie, ACID) expected from traditional databases. To provide operational efficiency (minimal manual involvement) First published in the Proceedings of VLDB (Very Large Databases) in 2011, the Aerospike architecture consists of 3 layers: The cluster-aware Client Layer includes open source client libraries that implement Aerospike APIs, track nodes and know where data reside in the cluster. The self-managing Clustering and Data Distribution Layer oversees cluster communications and automates fail-over, replication, cross data center synchronization and intelligent re-balancing and data migration. The flash-optimized Data Storage Layer reliably stores data in RAM and Flash. Aerospike scales up and out on commodity servers. 1) Each multi-threaded, multi-core, multi-cpu server is fast 2) Each server scales up to manage up to 16TB of data, with parallel access to up to 20 SSDs 3) Aerospike scales out linearly with each identical server added, to parallel process 100TB+ across the cluster (AppNexus was in production with a 50 node cluster that has now scaled up by increasing SSD capacity per node and reducing node count to 12, in preparation for scaling back up on node counts and SSD capacity ) - see http://www.aerospike.com/wp-content/uploads/2013/12/Aerospike-AppNexus-SSD-Case-Study_Final.pdf SYNC replication within cluster for immediate consistency, #replicas from 1 to #nodes in cluster, typical deployment has 2 or 3 copies of data; Can support ASYNC within cluster. ASYNC replication across clusters using Cross Data Center Replication (XDR). Writes automatically replicated across one or more locations. Fine grained control - individual sets or namespaces can be replicated across clusters in master/master/slave complex ring and star topologies.
  3. Flaws: Imagine that Jane has watched Harry Potter. It would be foolish to calculate similarity using the customer profiles who viewed this movie, because a very large number of people watched Harry Potter. If we generalize this idea, it would be that movies with the number of views over a certain threshold should be excluded. Cosine similarity assumes each element in the vector has the same weight. The elements in our vectors are the movie IDs, but we also have the rating of the movie also. A better similarity algorithm would include both the movie ID and its rating.