The document discusses MongoDB and data treatment. It covers how MongoDB can help with data integrity, confidentiality, correctness and reliability. It also discusses how MongoDB supports dynamic schemas, replication for high availability, security features and can be used as part of a modern enterprise technology stack including integration with Hadoop. MongoDB can be deployed on Azure as a fully managed service.
7. THE LARGEST ECOSYSTEM
9,000,000+
MongoDB Downloads
250,000+
Online Education Registrants
35,000+
MongoDB User Group Members
40,000+
MongoDB Management Service (MMS) Users
700+
Technology and Services Partners
1,000+
Customers Across All Industries
8. 8
MONGODB BUSINESS VALUE
Enabling New Apps Better Customer Experience
Faster Time to Market Lower TCO
9. FORTUNE 500 & GLOBAL 500
10 of the Top Financial Services Institutions
10 of the Top Electronics Companies
10 of the Top Media and Entertainment Companies
10 of the Top Retailers
10 of the Top Telcos
8 of the Top Technology Companies
6 of the Top Healthcare Companies
12. Data Treatment
• Data Treatment in IT is important for several reasons
– Integrity
– Confidentiality
– Correctness & Reliability
– Value
On Big Data and Flexible
Databases it's even more
Important!
14. Integrity
• Several Factors can influence data integrity
– Application Data Corruption
– Migrations
– "Fat Finger" events
• Different Strategies for dealing with those
– Backups
– Delayed Replicas
– Database Decoupling Architecture
– User Roles and Grants
15. Replica Sets
Replica Set – two or more copies
Self-healing shard
Addresses availability considerations:
High Availability
Disaster Recovery
Maintenance
Deployment Flexibility
Data locality to users
Workload isolation: operational & analytics
Delayed Replicas
20. Confidentiality
• This is probably one of the biggest issues for some Big Data
technology
For me, the nearly non-existent response to the security issue is shocking. Can it be
that people believe Hadoop is secure? Because it certainly is not. At every layer of
the stack, vulnerabilities exist, and at the level of the data itself there numerous
concerns
Merv Adrian – Gartner Research VP
http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/
28. Correctness & Reliability
• Different systems use different approaches
– Protocol buffers
– Columnar Databases have Family Data Types
– Thrift
• MongoDB uses BSON for Everything!
– JSON ?
• Not really – binary JSON
• http://bsonspec.org/
29. Documents are Rich Data Structures
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: ‘+447557505611’
city: ‘London’,
location: [45.123,47.232],
Profession: [banking, finance, trader],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain
arrays
37. MongoDB@ Azure
• MongoDB was designed to run everywhere
– Cloud
– Virtual
– Bare Metal
• On any Platform
– MacOSX
– Linux
– Solaris
– Windows
38. MongoDB@ Azure
• Setting MongoDB using Azure IaaS
– Build your instances
• Windows
• Linux
• …
• MongoDB as fully Managed Service
– Available on Azure Marketplace
– Sanity of mind
43. Data Treatment
• It's important that you assure good governance of your data!
– Integrity and Consistency
– Confidentiality and Security
– Correctness and Reliability
– Value!
• MongoDB offers all of that out of the box
– No crazy setups
– Simple, Scalable, Sophisticated
– Well integrated!
45. MongoDB & Hadoop
Applications
powered by
Analysis
powered by
Low latency
Rich fast querying
Flexible indexing
Ad hoc aggregations in database
Known data relationships
Great at looking at any subset of data
Longer jobs and queries
Analytical processing
Often highly partitionable
Unknown data relationships
Great at looking at all of data
MongoDB Connector
for Hadoop
46. Analytics Landscape
Batch / Predictive / Ad Hoc
(mins – hours)
Real-Time Dashboards
/ Scoring
(<30 ms)
Planned Reporting
(secs – mins )
Experimental
Legacy
47. MONGODB FEATURES
JSON Document Model
with Dynamic Schemas
Auto-Sharding for
Horizontal Scalability
Text Search
Aggregation Framework
and MapReduce
Full, Flexible Index Support
and Rich Queries
Built-In Replication
for High Availability
Advanced Security
Large Media Storage
with GridFS
48. MongoDB Use Cases
Single View Internet of Things Mobile Real-Time Analytics
Catalog Personalization Content Management
49. Enterprise Architecture
Customer-side Applications
Business Operations Applications
MOBILE WEB and CMS
IOT
SAAS / High-scale Online Services
Business Management Software
CRM ERP ITSM
Operational Tools
MONITORING SYSLOG
Core Business
Specific Systems
Operational Data Hub
Analytics
REAL-TIME
DATA WAREHOUSE
COMPUTE CLUTER
BUSINESS INTELLIGENCE
50. Common Applications
Customer-side Applications
MOBILE WEB and CMS
Development Productivity
• End-to-End JSON = Web Development
Productivity
• Asset Catalog Management: managing
frequently evolving schema
• Native search functionality
Geo-Aware Topology
• Multi-Active Data Center Support
Web-scale
• Scale-out economically without verticality
limitations
51. Strategic Initiatives
Customer-side Applications
IOT
SAAS/ High-scale Online Services
SaaS: Moving to the Cloud
• Need for High Availability (eg. 99.999% uptime SLA)
• No downtime Schema Migrations
• Native cross data-center replication and fault tolerance.
• Scale for multi-tenancy
• Operability at Scale: Tools to manage hundreds of nodes
IoT: Connecting the World
• Data model that can support a variety of data types
• Managing a volatile schema for ever growing and changing
data sources
• Scale big for high throughput and data volumes
• Geospatial support
52. Operational Data Hub
Operational Data Hub
An Example: The MetLife Wall
360° View 70 Systems 3 Months
53. Relational Model Challenges
70 Different Policy
Schemas
How can we translate
this into a Customer
View?
ETL 70 applications
into a Dimensional
Model? Integrating a
few is hard…
54. MetLife Wall
Strategy: All documents can have variable schemas
db.policies.find(
{first:”Dylan”,last:”Young”,
type:{
$in[“Healthcare”,”PPO”,”HMO”,”Auto”]
})
Collection of Policies
55. Data Hub: Master Data Distribution
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
56. For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
Editor's Notes
High Availability – Ensure application availability during many types of failures
Disaster Recovery – Address the RTO and RPO goals for business continuity
Maintenance – Perform upgrades and other maintenance operations with no application downtime
Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
What kinds of tasks?
Provisioning. Any topology, at scale, with the click of a button.
Upgrades. In minutes, with no downtime.
Scale. Add capacity without taking your application offline.
Continuous Backup. Customize to meet your recovery goals.
Point-in-time Recovery. Restore to any point in time, because disasters aren’t scheduled.
Performance Alerts. Monitor 100+ system metrics and get custom alerts before your system degrades.
MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search, geospatial, and more