Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB and In-Memory Computing


Published on

Learn about recent advances in MongoDB in the area of In-Memory Computing (Apache Spark Integration, In-memory Storage Engine), and how these advances can enable you to build a new breed of applications, and enhance your Enterprise Data Architecture.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

MongoDB and In-Memory Computing

  1. 1. ElevateYour Enterprise Architecture with an In-Memory Computing Strategy Dylan Tong Principal Solutions Architect
  2. 2. In-Memory Computing How can we process data as fast as possible by leveraging in-memory speed at it’s best? What are the possibilities if we could?
  3. 3. High-frequency trading (HFT) is a program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. It uses complex algorithms to analyze multiple markets and execute orders based on market conditions. Typically, the traders with the fastest execution speeds are more profitable than traders with slower execution speeds. Source: Investopedia Speed Matters…
  4. 4. Speed Matters… Amazon found that it increased revenue by 1% for every 100ms of improvement [source: Amazon] A 1-second delay in page load time equals 11% fewer page views, a 16% decrease in customer satisfaction, and 7% loss in conversions. [Source: Aberdeen Group] A study found that 27% of the participants who did mobile shopping were dissatisfied due to the experience being too slow. [Source: Forrester Consulting]
  5. 5. How Fast? Latency Unit RAM access 100s ns SSD access 100s µs HDD access 10s ms Normalized to 1 s ~6 min ~6 days ~12 months
  6. 6. Why Now? *Average $/GB 2015 $4.37 2013 $5.5 2010 $12.37 2005 $189 2000 $1,107 1995 $30,875 1990 $103,880 1985 $859,375 1980 $6,328,125 $0 $20 $40 $60 $80 $100 $120 $140 $160 $180 $200 2005 2010 2013 2015 Last 10 Years… “Generally affordable” *
  7. 7. Why Now? $0.00 $2.00 $4.00 $6.00 $8.00 $10.00 $12.00 $14.00 2010 2013 2015 “An Option at Scale” *Average $/GB 2015 $4.37 2013 $5.5 2010 $12.37 2005 $189 2000 $1,107 1995 $30,875 1990 $103,880 1985 $859,375 1980 $6,328,125 Last 5 Years… *
  8. 8. "This will process these data using algorithms for machine learning and artificial intelligence before sending the data back to the car. The zFAS board will in this way continuously extend its capabilities to master even complex situations increasingly better," Audi stated. "The piloted cars from Audi thus learn more every day and with each new situation they experience.” Source: The possibilities…
  9. 9. Challenges: Scale
  10. 10. Challenges: Cost Viability = $34,777/yr.  ~$1.74M/yr. for infrastructure to support 100TB
  11. 11. Challenges: Cost Viability Storage Type Avg. Cost ($/GB) Cost at 100TB ($) RAM 5.00 500K SSD 0.47-1.00 47K to 100K HDD 0.03 3K
  12. 12. Challenges: Durability Volatile Memory • What happens when things fail, and what data maybe loss? • How does the system synchronize with your durable storage? Does it do this well, and is it simple to implement?
  13. 13. Challenges: Design Still Matters
  14. 14. on RAM
  15. 15. Scenario : ECommerce Modernization Initiative Business Problems Technology Limitation Customer experience is suffering during high traffic events. Too expensive to scale system to support spike events. Scaling system is hard, and engineering teams can’t react fast enough in the event of unexpected growth Some caching solution implemented, but it mostly only helps with read performance; synchronizing writes has been a development nightmare. Lack of mobile customers in Europe and Asia has been attributed to latency issues. Difficult to extend data architecture globally, so effort is put on hold
  16. 16. Scenario : ECommerce Modernization Initiative Business Problems Technology Limitation Below industry conversation rate performance has been attributed partly to poor personalization Customer info is siloed across across the Enterprise, and it’s too complicated to bring this data together so effective models can be built to drive personalization “Big Data” project to bring data together to drive machine learning and cognitive capabilities in platform failed as data scientists report platform was too slow to develop on, and performance was impractical. Business analysts have siloed views of the eCommerce channel, and information isn’t getting to them fast enough Related to limitations above Integrating data into data warehouse is slow and hard to maintain
  17. 17. Orders Product Catalog Customer Data: Profile, Sessions, Carts, Personalization Inventory NoSQLRDBMS Platform Services eCommerce Datastores Dependent External Data Sources and Integrations CRM ERP PIM Data warehouse BI Tools … Platform API Scenario : ECommerce Modernization Initiative
  18. 18. Customer Data: Profile, Sessions, Carts, Personalization NoSQLRDBMS CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Product Catalog Silo Data-sources Problem SLOW AND POOR SCALABILITY
  19. 19. NoSQLRDBMS CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Operational Single View Operational Single View Customer Data: Profile, Sessions, Carts, Personalization Product Catalog
  20. 20. Operational Single View MongoDB Enterprise Data Hub Operational Single View
  21. 21. Reference: Metlife Wall Presentation
  22. 22. { product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’] } { product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26 } Documents in the same product catalog collection in MongoDB Dynamic Schema
  23. 23. Flexible Data Model: facilitates agile development and continuous delivery methodologies Scalability: scale-out dynamically as demand grows Still Agile, Scalable and Simple
  24. 24. High Performance: • More predictable, and lower latency on less in-memory infrastructure. In-Memory Storage Engine Infrastructure Optimization: • Assign a data subset on the In-Memory SE via Zone Sharding. • Optimize on cost vs. performance without silos. .Rich Query Capability: • Full MongoDB Query and Indexing Support. IN-MEMORY SE NODES WIREDTIGER NODES
  25. 25. WEST EAST Update SHARD 4 TAG: EAST, WT Local Read/Write with Strong Consistency Session Data Geographically Localized, and with In-memory Engine Latency SHARD 2 TAG: WEST, WT SHARD 3 TAG: EAST, IN_MEM SHARD 1 TAG: WEST, IN_MEM
  26. 26. Durability and Fault-Tolerance: • Mixed ReplicaSets allow data to be replicated from In-Memory SE to WT SE. • Full High Availability: automatic fail-over, cross geography. In-Memory Storage Engine
  27. 27. NoSQLRDBMS Platform Databases Dependent External Data Sources and Integrations CRM ERP PIM Partner Sources: Supplier databases…etc. Legacy: Mainframe Operational Unified View Advance Personalization 1. TRAIN/RE-TRAIN ML MODELS 2. APPLY MODELS TO REAL-TIME STREAM OF INTERACTIONS 3. DRIVE TARGETED CONTENT, RECOMMENDATIONS…ET C.
  28. 28. Why ? Speed. By exploiting in-memory optimizations, Spark has shown up to 100x higher performance than MapReduce running on Hadoop. Simplicity. Easy-to-use APIs for operating on large datasets. This includes a collection of sophisticated operators for transforming and manipulating semi-structured data. Unified Framework. Packaged with higher-level libraries, including support for SQL queries, machine learning, stream and graph processing. These standard libraries increase developer productivity and can be combined to create complex workflows.
  29. 29. Operational Single View +Spark Connector • Native Scala connector, certified by Databricks • Exposes all Spark APIs & libraries • Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations • Locality awareness to reduce data movement
  30. 30. Locality Awareness CLUSTER MANAGER Task Task Task Task Task DRIVER PROGRAM SPARK CONTEXT
  31. 31. Operational Single View +Spark Connector Blend client data from multiple internal and external sources to drive real time campaign optimization
  32. 32. MongoDB+Spark at China Eastern 180m fare calculations & 1.6 billion searches per day Oracle database peaked at 200 searches per second. Radically re-architect their fare engine to meet the required 100x growth in search traffic.
  33. 33. ETL (Yesterday’s) Data at the Speed of Thought?
  34. 34. BI Connector BI Connector db.orders.aggregate( [ { $group: { _id: null, total: { $sum: "$price" } } } ] ) SELECT SUM(price) AS total FROM orders
  35. 35. Resources for You Spark Connector • Download: Spark Packages GitHub • Documentation • Whitepaper: Turning Analytics into Real-Time Action • Education:M233: Getting Started with Spark and MongoDB In-Memory Storage Engine • Download: Enterprise Server • Documentation BI Connector • Download: BI Connector • Documentation
  36. 36. Dylan Tong Principal Solutions Architect Q&A