Recommendation engine using Aerospike and/OR MongoDB

•Download as PPTX, PDF•

3 likes•1,762 views

Recommendations are used through out the online world to recommend to a user other products or service they may be interested in. For example, an e commerce site may recommend other products for sale, or an online entertainment site, like Hulu or NetFlicks, may offer entertainment recommendations. This presentation discusses a very elementary recommendation engine implemented in Aerospike and MongoDB

Technology

IN-MEMORY + NOSQL + ACID
RESTFUL RECOMMENDATION ENGINE
USING SPRING, AEROSPIKE AND MONGODB
PETER MILNE
DIRECTOR OF APPLICATION ENGINEERING
QCON SAN FRANCISCO
NOVEMBER 2014
Aerospike aer . o . spike [air-oh- spahyk]
noun, 1. tip of a rocket that enhances speed and stability
© 2014 Aerospike. All rights reserved. Confidential 1

Where is the wisdom we have lost in
knowledge? Where is the knowledge we
have lost in information?
-T.S.Elliot 1934
© 2014 Aerospike. All rights reserved. Confidential 2
Sagely Advice
Don’t let:
■ Information oversaturate your Knowledge
■Knowledge distract you from Wisdom

Recommendation Engines
Recommendation engines are used in
applications to personalize user
experience.
© 2014 Aerospike. All rights reserved. Confidential 3

Movie recommendation Engine
© 2014 Aerospike. All rights reserved. Confidential 4
■ Similar to
■ Hulu, Netflix, YouTube, Daily Motion, etc
■ Context free
■ RESTful service
■Spring Boot
■ ~100,000 Movies
■ ~800,000 recommendations
■ Data store
■Aerospike
■MongoDB

© 2014 Aerospike. All rights reserved. Confidential 5
RESTful web services
■ HTTP based
■GET
■POST
■PUT
■DELETE
■ Payload
■XML
■JSON
Commonly called “APIs”
■ “APIs are like … Everybody has
one”

© 2014 Aerospike. All rights reserved. Confidential 6
Spring Boot
“Spring Boot makes it easy to create stand-alone,
production-grade Spring based
Applications that can you can just run.”
-- spring.io

© 2014 Aerospike. All rights reserved. Confidential 7
MongoDB
■ NoSQL
■ Schema less
■ Data in JSON documents
■BSON on server
■ Seductive programmatic interface
■Fashionable JSON
■ In-memory a.k.a RAM
■Fast

© 2014 Aerospike. All rights reserved. Confidential 8
Aerospike
1) No Hotspots
– DHT simplifies data
partitioning
2) Smart Client – 1 hop to
data, no load balancers
3) Shared Nothing
Architecture,
every node is identical 4) Single row ACID
– synch replication in
cluster
5) Smart Cluster, Zero Touch
– auto-failover,
rebalancing, rack aware,
rolling upgrades
6) Flash Optimized

© 2014 Aerospike. All rights reserved. Confidential 9
Cosine Similarity
“Cosine similarity is a measure of
similarity between two vectors of
an inner product space that measures
the cosine of the angle between
them. …. Cosine similarity is
particularly used in positive space,
where the outcome is neatly
bounded in [0,1].” - Wikipedia

© 2014 Aerospike. All rights reserved. Confidential 11
Data model

© 2014 Aerospike. All rights reserved. Confidential 12
One to Many

© 2014 Aerospike. All rights reserved. Confidential 13
The code
RESTful
List of Movies
watched
Make a vector

Finding a similar customer
Cosine similarity
© 2014 Aerospike. All rights reserved. Confidential 14
Watched list
For each movie
For each
customer

© 2014 Aerospike. All rights reserved. Confidential 15
Conclusions
■ MongoDB
■Technically seductive programmer interface
■Uses JSON ( the current fashion )
■Painful to scale – requires lots of servers
■Needed a big cluster - Ran out of RAM
■ Aerospike
■Tiny cluster required
■Easy to scale
■Faster than Mongo
■Large Data Type are different to JSON – but good

© 2014 Aerospike. All rights reserved. Confidential 16
Books

Resources
➤ GitHub – Recommendation engine example
 https://github.com/aerospike/recommendation-engine-example.git
© 2014 Aerospike. All rights reserved. Confidential 17
➤ Spring Boot
 http://projects.spring.io/spring-boot/
➤ Aerospike
 http://www.aerospike.com/
➤ MongoDB
 http://www.mongodb.org/

Questions?
peter@aerospike.com
helipilot50
@helipilot50
© 2014 Aerospike. All rights reserved. Confidential 18

Viewers also liked

Building an Online-Recommendation Engine with MongoDB

MongoDB

Personalized recommendations drive business, helping people find the products they want, the news they need, and the music they didn't know they would love. Despite the obvious advantages, many companies either don't have recommendations or don't leverage their data to make good ones. Too many recommendation engines are black-box algorithms that are hard to change or don't scale well. Using the same recommendation techniques as used at StubHub, Viacom, and AP, this technical webinar will show you how to load your data from MongoDB into Hadoop, generate recommendations, and then put those recommendations into MongoDB, ready to serve end-users. This webinar will prepare you to build a custom recommender for your company that is highly scalable, easy to understand, and built on open-source technology. K Young: About the speaker K Young is the CEO of Mortar Data. Mortar serves data scientists and engineers with a service that makes creating and operating high-scale data pipelines easy. Mortar contributes to several open source projects including Pig, Luigi, and the Mongo-Hadoop connector. Prior to founding Mortar Data, K built software that reaches one in ten public school students in the U.S. He holds a Computer Science degree from Rice University.

Partner Webinar: Recommendation Engines with MongoDB and Hadoop

MongoDB

As web developers, we use a huge range of frameworks, hosting providers, and other tools. Installing, configuring, and integrating with all of these tools in our local development environments is an increasingly large challenge. Older local development solutions using native installations of basic services don't achieve adequate production parity and leave team members using different versions of key tools. Newer virtualized solutions like Vagrant can solve some problems, but can be frustratingly slow to use. Enter Kalabox. Kalabox provides an easy-to-use architecture to orchestrate Docker containers on Mac OS, Windows, and Debian. Learn how you can use Kalabox to create a more effective local development stack for your entire team and start eliminating the barrier between your local and production environments.

Containers Aren't Just for the Cloud

Alec Reynolds

Rapid Application Design in Financial Services

Aerospike

In this talk we review what Docker is and why it’s important to Developers, Admins and DevOps when they are using a NoSQL Database such as Aerospike, the high performance NoSQL Database. Persistence is a critical element for a successful multi-Container strategy. We also cover the following topics: Using Docker to Orchestrate a multi container application (Flask + Aerospike) Injecting HAProxy and other production requirements as we deploy to production Scaling the Web and Aerospike clusters to grow to meet demand This presentation led by Alvin Richards, VP of Product at Aerospike includes an interactive demo showcasing the core Docker components (Machine, Engine, Swarm and Compose) along with Aerospike’s integration. We hope you will see how much simpler Docker can make building and deploying multi-node Aerospike based applications.

01282016 Aerospike-Docker webinar

Aerospike, Inc.

Configuring Aerospike - Part 1

Aerospike, Inc.

Single View of the Customer

MongoDB

Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...

MongoDB

Whats the buzz about? When it comes to NoSQL, what do some of the most experienced developers know about NoSQL that makes them select Aerospike over any other NoSQL database? Find the full webinar with audio here - http://www.aerospike.com/webinars This presentaion will review how real-time big data driven applications are changing consumer expectations and enterprise requirements for operational databases that enable powerful and personalized customer experiences. We will describe common use cases, typical customer deployments and present an overview of Aerospike's hybrid in-memory (DRAM + Flash) and scale-out architecture.

Introduction to Aerospike

Aerospike, Inc.

Build a Recommendation Engine using Amazon Machine Learning in Real-time

Amazon Web Services

The Right (and Wrong) Use Cases for MongoDB

MongoDB

Technical Seminar PPT

Kshitiz_Vj

Common MongoDB Use Cases

MongoDB

Building a Recommendation Engine - An example of a product recommendation engine

NYC Predictive Analytics

MongoDB Schema Design: Four Real-World Examples

Mike Friedman

Viewers also liked (15)

Building an Online-Recommendation Engine with MongoDB

Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Containers Aren't Just for the Cloud

Rapid Application Design in Financial Services

01282016 Aerospike-Docker webinar

Configuring Aerospike - Part 1

Single View of the Customer

Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...

Introduction to Aerospike

Build a Recommendation Engine using Amazon Machine Learning in Real-time

The Right (and Wrong) Use Cases for MongoDB

Technical Seminar PPT

Common MongoDB Use Cases

Building a Recommendation Engine - An example of a product recommendation engine

MongoDB Schema Design: Four Real-World Examples

Similar to Recommendation engine using Aerospike and/OR MongoDB

Brian Bulkowski : what startups can learn from real-time bidding

Aerospike

NoSQL in Real-time Architectures

Ronen Botzer

Aerospike founder & VP of Engineering & Operations Srini Srinivasan, and Engineering Lead Sunil Sayyaparaju, will review the principles of the CAP Theorem and how they apply to the Aerospike database. They will give a brief technical overview of ACID support in Aerospike and describe how Aerospike’s continuous availability and practical approach to avoiding partitions provides the highest levels of consistency in an AP system. They will also show how to optimize Aerospike and describe how this is achieved in numerous real world scenarios.

ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID

Aerospike, Inc.

Aerospike Architecture

Peter Milne

OSGi Community Event 2014 Abstract: Let's say you need to provide an internet service to your users. Chances are that your service should be available via REST. Let's say your service should both provide data to users as well as accept data posted by users, and possibly some logic. Now let's assume your service turns out to become incredibly popular, with lots and lots of users. Sounds like you need Sling and OSGi in the cloud. In this talk Carsten and David will go through the OSGi and Sling architecture to achieve this. The talk outlines how the OSGi Cloud Ecosystems RFC is used in combination with Apache jclouds to achieve vendor independence. It also discusses how automatic scaling depending on measured load is achieved to ensure responsiveness. The resulting system is a dynamic cloud application handling any REST API, which can scale up and down depending on the need. Speaker Bios: David Bosschaert David Bosschaert works for Adobe Research and Development. He spends the much of his time on technology relating to OSGi in Apache and other open source projects. He is also co-chair of the OSGi Enterprise Expert Group and an active participant in the OSGi Cloud efforts. Before joining Adobe, David worked for Red Hat/JBoss and IONA Technologies in Dublin, Ireland. Carsten Ziegeler Carsten Ziegeler is senior developer at Adobe Research Switzerland and spends most of his time on architectural and infrastructure topics. Working for over 25 years in open source projects, Carsten is a member of the Apache Software Foundation and heavily participates in several Apache communities including Sling, Felix and ACE. He is a frequent speaker on technology and open source conferences and participates in the OSGi Core Platform and Enterprise expert groups.

Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...

mfrancis

Aerospike Architecture

Peter Milne

Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour

Aerospike, Inc.

Node.Js: Basics Concepts and Introduction

Kanika Gera

Flash Economics and Lessons learned from operating low latency platforms at h...

Aerospike, Inc.

SD Times - Docker v2

Alvin Richards

Instantaneous Replication of Build Artifacts with NetApp

NetApp

Copr HD OpenStack Day India

openstackindia

Using Databases and Containers From Development to Deployment

Aerospike, Inc.

OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017

Cloud Native Day Tel Aviv

Generating NoSQL from SQL

Peter Milne

Monitor OpenStack Environments from the bottom up and front to back

Icinga

This presentation reviews these steps, scenarios and more: • What is this database going to be used for – a reporting server or data warehouse, or as an operational database supporting an application? • Which resources should I spend the budget on to ensure optimal database performance – bigger servers, more CPUs/cores, disks, or more memory? • What are my backup requirements? If I ever need to restore, how far back do I need to go and what will that mean to the business? • How will I handle any hot fixes, such as security patches? What downtime can be afforded and what processes need to be in place to apply critical or maintenance updates? • What are my replication and failover requirements and what should I do for my high availability configuration? To listen to the recording visit www.enterprisedb.com - click on the Resources tab - and review the list of On-Demand Webcasts. If you have further questions, email sales@enterprisedb.com.

Top10 list planningpostgresdeployment.2014

EDB

Speaker: James Staten - VP and Principal Analyst, Forrester Research As a RightScale user you are clearly a leading adopter of cloud computing, but have you matured your use of the cloud to the point that you are fully exploiting the advantages it provides? Most cloud users aren’t. In this session, Forrester Research VP and Principal Analyst James Staten will help you understand how to move from a cloud user to an optimizer to a profit maker as you progress your understanding of cloud economics and evolve your application design and deployment practices.

The Three Stages of Cloud Adoption - RightScale Compute 2013

RightScale

Microsoft has often used Fix It patches, which are a subset of Application Compatibility Fixes, as a way to stop newly identified active exploitation methods against their products. At Derbycon 2013 Mark Baggett discussed ways that attackers can use them for creating rootkits. Then in March of 2014 I presented an analysis of the previously undocumented in-memory patch and showed how attackers could use these to create patches and maintain persistence on a system. This talk will provide an overview and summary of the previous work and then show how it’s currently being used in the wild. I’ll first show how third parties are using the application toolkit for valid reasons. I will then show two instances, active and ongoing in the wild, of malware using the methods we’ve described.

CODE BLUE 2014 : Persisted: The active use and exploitation of Microsoft's Ap...

CODE BLUE

Customer and business requirements are shifting constantly. Today’s powerful programming languages can keep up—but what about your database? NetApp® MongoDB solutions offer a flexible, scalable answer. Learn how NetApp storage solutions will accelerate your MongoDB Performance, reduce operational Costs and provide the highest levels of Availability and Security. These solutions provide advanced fault-recovery features and easy, in-service growth capabilities to accommodate your unpredictable, ever-changing business demands. NetApp storage is designed to help you build a high-performance, cost-efficient, and highly available analytics solution. So you can focus on adding real business value.

MongoDB Europe 2016 - Deploying MongoDB on NetApp storage

MongoDB

Similar to Recommendation engine using Aerospike and/OR MongoDB (20)

Brian Bulkowski : what startups can learn from real-time bidding

NoSQL in Real-time Architectures

ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID

Aerospike Architecture

Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...

Aerospike Architecture

Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour

Node.Js: Basics Concepts and Introduction

Flash Economics and Lessons learned from operating low latency platforms at h...

SD Times - Docker v2

Instantaneous Replication of Build Artifacts with NetApp

Copr HD OpenStack Day India

Using Databases and Containers From Development to Deployment

OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017

Generating NoSQL from SQL

Monitor OpenStack Environments from the bottom up and front to back

Top10 list planningpostgresdeployment.2014

The Three Stages of Cloud Adoption - RightScale Compute 2013

CODE BLUE 2014 : Persisted: The active use and exploitation of Microsoft's Ap...

MongoDB Europe 2016 - Deploying MongoDB on NetApp storage

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Real Time Object Detection Using Open CV

Khem

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Real Time Object Detection Using Open CV

Exploring the Future Potential of AI-Enabled Smartphone Processors

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

What Are The Drone Anti-jamming Systems Technology?

Artificial Intelligence: Facts and Myths

Driving Behavioral Change for Information Management through Data-Driven Gree...

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Boost Fertility New Invention Ups Success Rates.pdf

How to Troubleshoot Apps for the Modern Connected Worker

Automating Google Workspace (GWS) & more with Apps Script

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

A Domino Admins Adventures (Engage 2024)

Powerful Google developer tools for immediate impact! (2023-24 C)

Boost PC performance: How more available memory can improve productivity

Data Cloud, More than a CDP by Matt Robison

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Handwritten Text Recognition for manuscripts and early printed texts

Recommendation engine using Aerospike and/OR MongoDB

1. IN-MEMORY + NOSQL + ACID RESTFUL RECOMMENDATION ENGINE USING SPRING, AEROSPIKE AND MONGODB PETER MILNE DIRECTOR OF APPLICATION ENGINEERING QCON SAN FRANCISCO NOVEMBER 2014 Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability © 2014 Aerospike. All rights reserved. Confidential 1

2. Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -T.S.Elliot 1934 © 2014 Aerospike. All rights reserved. Confidential 2 Sagely Advice Don’t let: ■ Information oversaturate your Knowledge ■Knowledge distract you from Wisdom

4. Movie recommendation Engine © 2014 Aerospike. All rights reserved. Confidential 4 ■ Similar to ■ Hulu, Netflix, YouTube, Daily Motion, etc ■ Context free ■ RESTful service ■Spring Boot ■ ~100,000 Movies ■ ~800,000 recommendations ■ Data store ■Aerospike ■MongoDB

6. © 2014 Aerospike. All rights reserved. Confidential 6 Spring Boot “Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that can you can just run.” -- spring.io

7. © 2014 Aerospike. All rights reserved. Confidential 7 MongoDB ■ NoSQL ■ Schema less ■ Data in JSON documents ■BSON on server ■ Seductive programmatic interface ■Fashionable JSON ■ In-memory a.k.a RAM ■Fast

8. © 2014 Aerospike. All rights reserved. Confidential 8 Aerospike 1) No Hotspots – DHT simplifies data partitioning 2) Smart Client – 1 hop to data, no load balancers 3) Shared Nothing Architecture, every node is identical 4) Single row ACID – synch replication in cluster 5) Smart Cluster, Zero Touch – auto-failover, rebalancing, rack aware, rolling upgrades 6) Flash Optimized

9. © 2014 Aerospike. All rights reserved. Confidential 9 Cosine Similarity “Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. …. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1].” - Wikipedia

10. The recommendation algorithm 1. Jane Doe accesses the application 2. Retrieve Jane’s User Profile 3. Retrieve the Movie record for each movie that Jane has watched. For each movie: ■ Retrieve each of the watched user profiles ■ See if this profile is similar to Jane’s by giving it a score 4. Using the user profile with the highest similarity score, recommend the movies in this user profile that Jane has not seen. Ac  B = B A A B Relative complement This is a very elementary technique and it is useful only as an illustration, and it does have several flaws © 2014 Aerospike. All rights reserved. Confidential 10

15. © 2014 Aerospike. All rights reserved. Confidential 15 Conclusions ■ MongoDB ■Technically seductive programmer interface ■Uses JSON ( the current fashion ) ■Painful to scale – requires lots of servers ■Needed a big cluster - Ran out of RAM ■ Aerospike ■Tiny cluster required ■Easy to scale ■Faster than Mongo ■Large Data Type are different to JSON – but good

17. Resources ➤ GitHub – Recommendation engine example  https://github.com/aerospike/recommendation-engine-example.git © 2014 Aerospike. All rights reserved. Confidential 17 ➤ Spring Boot  http://projects.spring.io/spring-boot/ ➤ Aerospike  http://www.aerospike.com/ ➤ MongoDB  http://www.mongodb.org/

Editor's Notes

For example, e-commerce applications recommend products to a customer that other customers with similar behavior have viewed, enjoyed, or purchased. News applications would use a real-time recommendation engine to provide readers with fresh content, as stories come and go quickly. These application additions improve the user experience, increase sales, and help retain loyal customers.
The Big Picture Aerospike is a distributed, scalable NoSQL database. It is architected with three key objectives: To create a flexible, scalable platform that would meet the needs of today’s web-scale applications To provide the robustness and reliability (ie, ACID) expected from traditional databases. To provide operational efficiency (minimal manual involvement) First published in the Proceedings of VLDB (Very Large Databases) in 2011, the Aerospike architecture consists of 3 layers: The cluster-aware Client Layer includes open source client libraries that implement Aerospike APIs, track nodes and know where data reside in the cluster. The self-managing Clustering and Data Distribution Layer oversees cluster communications and automates fail-over, replication, cross data center synchronization and intelligent re-balancing and data migration. The flash-optimized Data Storage Layer reliably stores data in RAM and Flash. Aerospike scales up and out on commodity servers. 1) Each multi-threaded, multi-core, multi-cpu server is fast 2) Each server scales up to manage up to 16TB of data, with parallel access to up to 20 SSDs 3) Aerospike scales out linearly with each identical server added, to parallel process 100TB+ across the cluster(AppNexus was in production with a 50 node cluster that has now scaled up by increasing SSD capacity per node and reducing node count to 12, in preparation for scaling back up on node counts and SSD capacity )- see http://www.aerospike.com/wp-content/uploads/2013/12/Aerospike-AppNexus-SSD-Case-Study_Final.pdf SYNC replication within cluster for immediate consistency, #replicas from 1 to #nodes in cluster, typical deployment has 2 or 3 copies of data; Can support ASYNC within cluster. ASYNC replication across clusters using Cross Data Center Replication (XDR). Writes automatically replicated across one or more locations. Fine grained control - individual sets or namespaces can be replicated across clusters in master/master/slave complex ring and star topologies.
Flaws: Imagine that Jane has watched Harry Potter. It would be foolish to calculate similarity using the customer profiles who viewed this movie, because a very large number of people watched Harry Potter. If we generalize this idea, it would be that movies with the number of views over a certain threshold should be excluded. Cosine similarity assumes each element in the vector has the same weight. The elements in our vectors are the movie IDs, but we also have the rating of the movie also. A better similarity algorithm would include both the movie ID and its rating.

Recommendation engine using Aerospike and/OR MongoDB

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (15)

Similar to Recommendation engine using Aerospike and/OR MongoDB

Similar to Recommendation engine using Aerospike and/OR MongoDB (20)

Recently uploaded

Recently uploaded (20)

Recommendation engine using Aerospike and/OR MongoDB

Editor's Notes