SlideShare a Scribd company logo
1 of 37
ElevateYour Enterprise
Architecture with an In-Memory
Computing Strategy
Dylan Tong
Principal Solutions Architect
dylan.tong@mongodb.com
In-Memory Computing
How can we process data as fast as possible
by leveraging in-memory speed at it’s best?
What are the possibilities if we could?
High-frequency trading (HFT) is a program trading platform that uses
powerful computers to transact a large number of orders at very fast
speeds. It uses complex algorithms to analyze multiple markets and
execute orders based on market conditions.
Typically, the traders with the fastest execution speeds are more
profitable than traders with slower execution speeds.
Source: Investopedia
Speed Matters…
Speed Matters…
Amazon found that it increased revenue by 1% for every 100ms of
improvement [source: Amazon]
A 1-second delay in page load time equals 11% fewer page views,
a 16% decrease in customer satisfaction, and 7% loss in
conversions. [Source: Aberdeen Group]
A study found that 27% of the participants who did mobile shopping
were dissatisfied due to the experience being too slow. [Source:
Forrester Consulting]
How Fast?
Latency Unit
RAM access 100s ns
SSD access 100s µs
HDD access 10s ms
Normalized to 1 s
~6 min
~6 days
~12 months
Why Now?
*Average $/GB
2015 $4.37
2013 $5.5
2010 $12.37
2005 $189
2000 $1,107
1995 $30,875
1990 $103,880
1985 $859,375
1980 $6,328,125
$0
$20
$40
$60
$80
$100
$120
$140
$160
$180
$200
2005 2010 2013 2015
Last 10 Years…
“Generally affordable”
*http://www.statisticbrain.com/average-historic-price-of-ram/
Why Now?
$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
2010 2013 2015
“An Option at Scale”
*Average $/GB
2015 $4.37
2013 $5.5
2010 $12.37
2005 $189
2000 $1,107
1995 $30,875
1990 $103,880
1985 $859,375
1980 $6,328,125
Last 5 Years…
*http://www.statisticbrain.com/average-historic-price-of-ram/
"This will process these data using algorithms for machine
learning and artificial intelligence before sending the data
back to the car.
The zFAS board will in this way continuously extend its
capabilities to master even complex situations increasingly
better," Audi stated. "The piloted cars from Audi thus learn
more every day and with each new situation they
experience.”
Source: T3.com
The possibilities…
Challenges: Scale
Challenges: Cost Viability
= $34,777/yr.  ~$1.74M/yr. for infrastructure to support 100TB
Challenges: Cost Viability
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.00 500K
SSD 0.47-1.00 47K to 100K
HDD 0.03 3K
http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
Challenges: Durability
Volatile Memory
• What happens when things fail,
and what data maybe loss?
• How does the system synchronize
with your durable storage? Does it
do this well, and is it simple to
implement?
Challenges: Design Still Matters
on RAM
Scenario : ECommerce Modernization
Initiative
Business Problems Technology Limitation
Customer experience is suffering during high traffic
events.
Too expensive to scale system to support spike
events.
Scaling system is hard, and engineering teams
can’t react fast enough in the event of unexpected
growth
Some caching solution implemented, but it mostly
only helps with read performance; synchronizing
writes has been a development nightmare.
Lack of mobile customers in Europe and Asia has
been attributed to latency issues.
Difficult to extend data architecture globally, so
effort is put on hold
Scenario : ECommerce Modernization
Initiative
Business Problems Technology Limitation
Below industry conversation rate performance
has been attributed partly to poor personalization
Customer info is siloed across across the
Enterprise, and it’s too complicated to bring this
data together so effective models can be built to
drive personalization
“Big Data” project to bring data together to drive
machine learning and cognitive capabilities in
platform failed as data scientists report platform
was too slow to develop on, and performance
was impractical.
Business analysts have siloed views of the
eCommerce channel, and information isn’t
getting to them fast enough
Related to limitations above
Integrating data into data warehouse is slow and
hard to maintain
Orders
Product
Catalog
Customer Data:
Profile, Sessions,
Carts, Personalization
Inventory
NoSQLRDBMS
Platform Services
eCommerce Datastores Dependent External Data Sources and Integrations
CRM ERP PIM
Data warehouse
BI Tools
…
Platform API
Scenario : ECommerce Modernization
Initiative
Customer Data:
Profile, Sessions,
Carts, Personalization
NoSQLRDBMS CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Product
Catalog
Silo Data-sources Problem
SLOW AND POOR SCALABILITY
NoSQLRDBMS CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Operational Single View
Operational Single View
Customer Data:
Profile, Sessions,
Carts, Personalization
Product
Catalog
Operational Single View
MongoDB
Enterprise Data Hub
Operational Single View
Reference: Metlife Wall Presentation
{
product_name: ‘Acme Paint’,
color: [‘Red’, ‘Green’],
size_oz: [8, 32],
finish: [‘satin’, ‘eggshell’]
}
{
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
color: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
product_name: ‘Mountain Bike’,
brake_style: ‘mechanical disc’,
color: ‘grey’,
frame_material: ‘aluminum’,
no_speeds: 21,
package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05,
suspension_type: ‘dual’,
wheel_size_in: 26
}
Documents in the same product catalog collection in MongoDB
Dynamic Schema
Flexible Data Model: facilitates
agile development and continuous
delivery methodologies
Scalability: scale-out dynamically
as demand grows
Still Agile, Scalable and Simple
High Performance:
• More predictable, and lower
latency on less in-memory
infrastructure.
In-Memory Storage Engine
Infrastructure Optimization:
• Assign a data subset on the
In-Memory SE via Zone
Sharding.
• Optimize on cost vs.
performance without silos.
.Rich Query Capability:
• Full MongoDB Query and
Indexing Support.
IN-MEMORY SE NODES WIREDTIGER NODES
WEST EAST
Update
SHARD 4
TAG: EAST, WT
Local Read/Write with Strong Consistency
Session Data Geographically Localized, and with In-memory Engine Latency
SHARD 2
TAG: WEST, WT
SHARD 3
TAG: EAST, IN_MEM
SHARD 1
TAG: WEST, IN_MEM
Durability and Fault-Tolerance:
• Mixed ReplicaSets allow data to
be replicated from In-Memory SE
to WT SE.
• Full High Availability: automatic
fail-over, cross geography.
In-Memory Storage Engine
NoSQLRDBMS
Platform Databases Dependent External Data Sources and Integrations
CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Operational Unified View
Advance Personalization
1. TRAIN/RE-TRAIN
ML MODELS
2. APPLY MODELS TO
REAL-TIME
STREAM OF
INTERACTIONS
3. DRIVE TARGETED
CONTENT,
RECOMMENDATIONS…ET
C.
Why ?
Speed. By exploiting in-memory optimizations, Spark
has shown up to 100x higher performance than
MapReduce running on Hadoop.
Simplicity. Easy-to-use APIs for operating on large
datasets. This includes a collection of sophisticated
operators for transforming and manipulating
semi-structured data.
Unified Framework. Packaged with higher-level libraries,
including support for SQL queries, machine learning,
stream and graph processing. These standard libraries
increase developer productivity and can be combined to
create complex workflows.
Operational Single View
+Spark Connector
• Native Scala connector,
certified by Databricks
• Exposes all Spark APIs &
libraries
• Efficient data filtering
with predicate
pushdown, secondary
indexes, & in-database
aggregations
• Locality awareness to
reduce data movement
Locality Awareness
CLUSTER
MANAGER
Task
Task
Task
Task
Task
DRIVER
PROGRAM
SPARK
CONTEXT
Operational Single View
+Spark Connector
Blend client data from multiple
internal and external sources to
drive real time campaign
optimization
MongoDB+Spark at China Eastern
180m fare calculations & 1.6
billion searches per day
Oracle database peaked at 200
searches per second.
Radically re-architect their fare
engine to meet the required
100x growth in search traffic.
ETL
(Yesterday’s) Data at the Speed of Thought?
BI Connector
BI Connector
db.orders.aggregate( [
{
$group: {
_id: null,
total: { $sum:
"$price" }
}
}
] )
SELECT SUM(price)
AS total
FROM orders
Resources for You
Spark Connector
• Download: Spark Packages
GitHub
• Documentation
• Whitepaper:
Turning Analytics into Real-Time
Action
• Education:M233: Getting
Started with Spark and
MongoDB
In-Memory Storage Engine
• Download: Enterprise Server
• Documentation
BI Connector
• Download: BI Connector
• Documentation
Dylan Tong
Principal Solutions Architect
dylan.tong@mongodb.com
Q&A

More Related Content

What's hot

Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
DataWorks Summit
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
paramitap
 

What's hot (20)

Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsight
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
 
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Developing Software for Persistent Memory / Willhalm Thomas (Intel)Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 
Aerospike: Enabling Your Digital Transformation
Aerospike: Enabling Your Digital TransformationAerospike: Enabling Your Digital Transformation
Aerospike: Enabling Your Digital Transformation
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Gartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systemsGartner magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
A Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLA Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQL
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Welcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the futureWelcome | MariaDB today and our vision for the future
Welcome | MariaDB today and our vision for the future
 

Viewers also liked

Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
Tim Vaillancourt
 

Viewers also liked (8)

Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2
 
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
Lean UX and Ecommerce Design: How Ai is transforming the insurance industry w...
 
MongoDB Days UK: Scaling MongoDB with Docker and cgroups
MongoDB Days UK: Scaling MongoDB with Docker and cgroupsMongoDB Days UK: Scaling MongoDB with Docker and cgroups
MongoDB Days UK: Scaling MongoDB with Docker and cgroups
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Securing Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPSecuring Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTP
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Similar to MongoDB and In-Memory Computing

Rob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for DatabaseRob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for Database
Rob Callaghan
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
MongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 

Similar to MongoDB and In-Memory Computing (20)

Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory ComputingWebinar: Elevate Your Enterprise Architecture with In-Memory Computing
Webinar: Elevate Your Enterprise Architecture with In-Memory Computing
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagrid
 
Rob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for DatabaseRob Callaghan_OOW14 IO Performance for Database
Rob Callaghan_OOW14 IO Performance for Database
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
La creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDBLa creación de una capa operacional con MongoDB
La creación de una capa operacional con MongoDB
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Oracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your CostsOracle Database 11g Lower Your Costs
Oracle Database 11g Lower Your Costs
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Vectorization whitepaper
Vectorization whitepaperVectorization whitepaper
Vectorization whitepaper
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

MongoDB and In-Memory Computing

  • 1. ElevateYour Enterprise Architecture with an In-Memory Computing Strategy Dylan Tong Principal Solutions Architect dylan.tong@mongodb.com
  • 2. In-Memory Computing How can we process data as fast as possible by leveraging in-memory speed at it’s best? What are the possibilities if we could?
  • 3. High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions. Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds. Source: Investopedia Speed Matters…
  • 4. Speed Matters… Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon] A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group] A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]
  • 5. How Fast? Latency Unit RAM access 100s ns SSD access 100s µs HDD access 10s ms Normalized to 1 s ~6 min ~6 days ~12 months
  • 6. Why Now? *Average $/GB 2015 $4.37 2013 $5.5 2010 $12.37 2005 $189 2000 $1,107 1995 $30,875 1990 $103,880 1985 $859,375 1980 $6,328,125 $0 $20 $40 $60 $80 $100 $120 $140 $160 $180 $200 2005 2010 2013 2015 Last 10 Years… “Generally affordable” *http://www.statisticbrain.com/average-historic-price-of-ram/
  • 7. Why Now? $0.00 $2.00 $4.00 $6.00 $8.00 $10.00 $12.00 $14.00 2010 2013 2015 “An Option at Scale” *Average $/GB 2015 $4.37 2013 $5.5 2010 $12.37 2005 $189 2000 $1,107 1995 $30,875 1990 $103,880 1985 $859,375 1980 $6,328,125 Last 5 Years… *http://www.statisticbrain.com/average-historic-price-of-ram/
  • 8. "This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car. The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.” Source: T3.com The possibilities…
  • 10. Challenges: Cost Viability = $34,777/yr.  ~$1.74M/yr. for infrastructure to support 100TB
  • 11. Challenges: Cost Viability Storage Type Avg. Cost ($/GB) Cost at 100TB ($) RAM 5.00 500K SSD 0.47-1.00 47K to 100K HDD 0.03 3K http://www.statisticbrain.com/average-cost-of-hard-drive-storage/ http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/
  • 12. Challenges: Durability Volatile Memory • What happens when things fail, and what data maybe loss? • How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?
  • 15. Scenario : ECommerce Modernization Initiative Business Problems Technology Limitation Customer experience is suffering during high traffic events. Too expensive to scale system to support spike events. Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare. Lack of mobile customers in Europe and Asia has been attributed to latency issues. Difficult to extend data architecture globally, so effort is put on hold
  • 16. Scenario : ECommerce Modernization Initiative Business Problems Technology Limitation Below industry conversation rate performance has been attributed partly to poor personalization Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization “Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical. Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough Related to limitations above Integrating data into data warehouse is slow and hard to maintain
  • 17. Orders Product Catalog Customer Data: Profile, Sessions, Carts, Personalization Inventory NoSQLRDBMS Platform Services eCommerce Datastores Dependent External Data Sources and Integrations CRM ERP PIM Data warehouse BI Tools … Platform API Scenario : ECommerce Modernization Initiative
  • 18. Customer Data: Profile, Sessions, Carts, Personalization NoSQLRDBMS CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Product Catalog Silo Data-sources Problem SLOW AND POOR SCALABILITY
  • 19. NoSQLRDBMS CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Operational Single View Operational Single View Customer Data: Profile, Sessions, Carts, Personalization Product Catalog
  • 20. Operational Single View MongoDB Enterprise Data Hub Operational Single View
  • 21. Reference: Metlife Wall Presentation
  • 22. { product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’] } { product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26 } Documents in the same product catalog collection in MongoDB Dynamic Schema
  • 23. Flexible Data Model: facilitates agile development and continuous delivery methodologies Scalability: scale-out dynamically as demand grows Still Agile, Scalable and Simple
  • 24. High Performance: • More predictable, and lower latency on less in-memory infrastructure. In-Memory Storage Engine Infrastructure Optimization: • Assign a data subset on the In-Memory SE via Zone Sharding. • Optimize on cost vs. performance without silos. .Rich Query Capability: • Full MongoDB Query and Indexing Support. IN-MEMORY SE NODES WIREDTIGER NODES
  • 25. WEST EAST Update SHARD 4 TAG: EAST, WT Local Read/Write with Strong Consistency Session Data Geographically Localized, and with In-memory Engine Latency SHARD 2 TAG: WEST, WT SHARD 3 TAG: EAST, IN_MEM SHARD 1 TAG: WEST, IN_MEM
  • 26. Durability and Fault-Tolerance: • Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE. • Full High Availability: automatic fail-over, cross geography. In-Memory Storage Engine
  • 27. NoSQLRDBMS Platform Databases Dependent External Data Sources and Integrations CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Operational Unified View Advance Personalization 1. TRAIN/RE-TRAIN ML MODELS 2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS 3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ET C.
  • 28. Why ? Speed. By exploiting in-memory optimizations, Spark has shown up to 100x higher performance than MapReduce running on Hadoop. Simplicity. Easy-to-use APIs for operating on large datasets. This includes a collection of sophisticated operators for transforming and manipulating semi-structured data. Unified Framework. Packaged with higher-level libraries, including support for SQL queries, machine learning, stream and graph processing. These standard libraries increase developer productivity and can be combined to create complex workflows.
  • 29. Operational Single View +Spark Connector • Native Scala connector, certified by Databricks • Exposes all Spark APIs & libraries • Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations • Locality awareness to reduce data movement
  • 31. Operational Single View +Spark Connector Blend client data from multiple internal and external sources to drive real time campaign optimization
  • 32. MongoDB+Spark at China Eastern 180m fare calculations & 1.6 billion searches per day Oracle database peaked at 200 searches per second. Radically re-architect their fare engine to meet the required 100x growth in search traffic.
  • 33. ETL (Yesterday’s) Data at the Speed of Thought?
  • 34. BI Connector BI Connector db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } } ] ) SELECT SUM(price) AS total FROM orders
  • 35. Resources for You Spark Connector • Download: Spark Packages GitHub • Documentation • Whitepaper: Turning Analytics into Real-Time Action • Education:M233: Getting Started with Spark and MongoDB In-Memory Storage Engine • Download: Enterprise Server • Documentation BI Connector • Download: BI Connector • Documentation
  • 36.
  • 37. Dylan Tong Principal Solutions Architect dylan.tong@mongodb.com Q&A

Editor's Notes

  1. Put simply, there are two big questions that I think define and drive in-memory computing: How can we process data s fast as possible by leveraging in-memory speed at it’s best? Secondly, what are the possibilities if we could?
  2. Why do we care about speed? It matters in a lot of cases… In the Financial world, it matters in areas like High Frequency trading, which is estimated to account for 50-70% of trades in the past 5 years. HFT platforms transact a large number of orders at very fast speeds, and often use complex algorithms to analyze multiple markets and market conditions Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds.
  3. Research by Enterprises and Analysts correlating performance, online experiences and revenue are well documented. I list a few here from some Analysts and Amazon, but there are other public studies from Google and Walmart demonstrating the same Well known study by Aberdeen Group discovered: A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions translated to dollars, if your business earn just $100,000 a day, this equates to $2.5M in potential sales annually. – faster is better. Slow online experiences translate to lost opportunities and we as users and consumers can relate.
  4. So, how fast is in-memory? Here’s the rough units that best measure data access times across different storage mediums. Click If we normalize to 1s, it is clear that the magnitude in speed is drastic between RAM and even fast SSD storage.
  5. Some may already be nodding their heads… RAM isn’t new technology, and we’re aware that the price of RAM has dropped drastically over the decade. By 2010, the sharp decline in average cost has made RAM “generally affordable” for mainstream use; however, it is far from cheap especially when we consider the data volumes that we work with today.
  6. However, prices continue to fall, and an average price of $4.37 in 2015 make RAM an option even at scale for greenfield projects that need the speed.
  7. IOT is certainly not a space short of innovation and possibilities, and the ability to scale in-memory performance only makes possibilities more exciting. I came across an article where Audi is discussing their plans for their connected self-driving car, and their intentions to send data collected from various sensors on the car back to the cloud where they will leverage ML to process data to send back to the car so that it can learn and better adapt to complex situations. “…machine learning it will mean adverse weather conditions, such as snow, which can affect sensors will be less of a problem as cars will have a thorough understanding of the piece of tarmac it is traversing” Consider the future, the scale of every vehicle on the road, the amount of data collected that needs to be processed. In-memory computing solutions will be needed to process big data fast especially in the world of smart cars where information will drive important decisions in real-time.
  8. Despite the significant increase in the amount of RAM you could put on a single server in the past couple of years, there are still limits, and the data volumes that we work with today continue to grow due to the type of applications we build, and the type of data sources we analyze and data mine. For many organizations, the bulk of workloads are being moved to or are in the cloud, and the ability to scale on cloud infrastructure is critical. The ability to scale-out to fit large data-sets in RAM across servers is critical. If not, data volume, then compute to support large scale services in the cloud.
  9. We previously discuss how cost has lowered dramatically, and while it is an option at scale, it can still be cost prohibitive for certain projects. Consider AWS’s X1 instance. Impressively provides nearly 2TB of RAM, but at a hefty price. At a scale of 100TBs, $1.74M just for infrastructure isn’t an option for certain projects. Question is, does the problem really require to have all your data in RAM?
  10. While memory is magnitudes faster than other storage mediums, the difference in relative cost is also significant. With that said, in-memory solutions shouldn’t be designed around needing your Enterprise data-architecture or even application to run entirely in-memory. The value of the data and the problem you’re solving should dictate what is the right medium, and an in-memory solution should seamless integrate into a Enterprise Data Architecture that supports all storage mediums.
  11. Generally, when we talk about memory we refer to what is readily available-- volatile memory; if you server goes down, then the data stored in that server’s RAM is lost unless it has also been put on durable storage like disk. Trading off data-loss for speed, in most use cases, isn’t acceptable. A good in-memory solution needs to provide fault tolerance, and it needs to synchronize with durable storage, and just as importantly, simply and reliably (which often isn’t the case for some solutions like external distributed caches).
  12. As fast as RAM is, it doesn’t remedy bad design. More importantly, any in-memory computing technology shouldn’t introduce new bottlenecks into the architecture, or limit your data architecture to addressing the biggest performance bottlenecks in your system. For instance: Does your in-memory computing solution require you to move large volumes of data around? If so, is that creating bottlenecks in other ways? How does your solution bring data into RAM? Is there an efficient caching algorithm, and is relevant data selected and filtered efficiently? How is your data being processed in RAM? Is there an efficient algorithm? Is it introducing inefficiencies and new performance bottlenecks by shuffling data unnecessarily across a distributed system?
  13. So know that we understand the challenges and core requirements around introducing in-memory technologies into your Enterprise Data Architecture, let’s understand how MongoDB fits into the big picture and what it can offer in this area.
  14. Let’s hone in on the product catalog and customer session management parts of the system as the problem is most clear. Customer session management component is key to driving customer experience like personalization, and effective personalization needs to be based on full picture of the customer – realistically, in an Enterprise, customer touch points and information is siloed across many systems, and rarely is there one place in an Enterprise where an operational system can get everything it needs to know about the customer. Likewise, with the Product Catalog, information about products will be siloed. Perhaps some info is stored within the ecommerce platform, but likely has to be synchronized with external systems like PIMs, and Supplier systems. Additionally, a modern platform should also be able to keep availability up to date as part of the product search, so problems aren’t caused downstream around order fulfillment. Finally, the business analysts will also need to analyze the same data sources. Consolidating these systems isn’t realistic Integration is necessary, and ideally it shouldn’t involve heavy redundancy; for instance, across operational and BI environments. Federated data access of these systems isn’t an option on many fronts due to performance and scale. Sufficient integration of data into the DW via traditional ETL is a huge effort and likely too slow to make happen.
  15. This component would be well served by MongoDB, and in fact, is one of the most common use cases for MongoDB.
  16. This component would be well served by MongoDB, and in fact, is one of the most common use cases for MongoDB.