SlideShare a Scribd company logo
1 of 17
Download to read offline
We offer MongoDB-as-a-Service on any cloud of your choice. You can read more about our
MongoDB-as-a-service in our white paper on our website: http://www.cumulogic.com/
resources/mongodb_wp/
The goal of this boot camp is to give you hands-on experience with MongoDB database-as-a-
service, how to load the data and show you a sample application to analyze the data. We will
use a small sample Twitter application for our hands-on lab, which will help you write a
MongoDB application. We will also discuss briefly a few performance-related so you can
analyze and tweak performance of your databases. At the same time, you will also see how
you can easily launch a fully managed MongoDB instance in the cloud.
About a decade ago, business applications were transactional in nature and most of the
issues were related to executing transactions (i.e. credit card processing) with low latency, as
a result enterprise data was more “relational” in nature and was therefore “structured”.
The nature of business applications has changed and enterprises are trying to figure out how
to use all the data in their enterprise systems, social media, machine logs, etc. to understand
how all the data impacts their business and how they can get competitive advantage by
leveraging nuggets in that data.
Fast forward till today and businesses are trying to solve a different problem. And with the
diverse nature of data sources and data formats, we need newer technologies that scale and
provide answers or identify those nuggets in the data at much faster speed and low cost than
traditional SQL database or data warehouse systems. Hence, we see a slew of new database
technologies being developed that promise to help solving these problems.
Depending on the nature of the data or problem they solve, we can categorize these new
database technologies in three major categories. (1) Document oriented databases, which
store and crunch data in document formats, (2) Key-value pair databases such as Riak and
Redis and (3) Graph databases. Depending on the type of data, we could use one of these
databases to solve your data analytics problems. Today, we are focus on MongoDB.
When should we want to use NoSQL database Vs SQL database, and which NoSQL
database?
As I mentioned before, the problems that NoSQL databases solve is related to the nature and
amount of data we want to processes in our next generation applications. We need databases
that can scale to petabytes of data at a fraction of the cost of a relational database. We need
database systems which can help us quickly analyze petabytes of data and provide results in
realtime - hence the speed and velocity of data access is critical.
NoSQL database systems can provide high speed access and low latency access to large
amount of data. And one key criteria to consider when choosing NoSQL database is the
nature of your applications and main issues with them – are they operational or analytical? For
example, for batch processing, analytical apps, you may be better off with Hadoop – while for
operational issues of scalability and realtime processing, you may want to choose MongoDB
database. So consider these criteria in making your decisions and do some experiments and
find the best ones that fits your application needs.
1. Let’s take a look at the key feature sets of MongoDB at very high level. MongoDB is a
document oriented database server. It stores objects as BSON (pronounced as bison), which
is a binary versions of JSON format and it supports dynamic schemas – which essentially
means it is schema-less database. There is no rigid SQL-like schema to store the data. This
gives flexibility in choosing the data types from different data sources such as social networks,
machine logs or CRM systems.
2. MongoDb supports indexing just like traditional SQL indexing, which means you can index
data on any field with high fidelity to improve query performance. (FYI – High fidelity here
means the field which is a variable in all records. For example, if we are storing data about
employees, the data field that varies most is the phone number and not the city name or
company name)
3. Most of you may be familiar with the concept of database sharding. MongoDB is a
horizontally scalable database and supports sharding – which means it stores data in smaller
chunks on several data nodes for low latency access to the data. Hence MongoDB is widely
used in the cloud because you can scale the database by adding shards as your data grows
and maintain that low latency of data access even as your size of the data grows.
4. MongoDB is designed to be resilient for data durability and supports replica sets which can
be geographically distributed
5. MongoDB supports Map-reduce operations and provides fast updates to the data.
FAQs: When do you want to use Hadoop Vs MongoDB for Map-reduce?
Answer: You want to use Hadoop for batch jobs, where you can fire up analytics on
offline data, whereas you can use MongoDB for realtime data analytics.
Question: How does Sharding work in MongoDB?
Answer: MongoDB sharding works by spreading writes to multiple data nodes.
Mongos, which is the mongoDB proces,s directs data to a different data node to write or read.
And show the slide – (refer to the sharding diagram)
Since MongoDB scales very well horizontally, it is the most widely used database in the cloud.
And given the complexity of managing mongoDB for maintaining availability, data durability
and performance, you may want to leverage platforms which provide you MongoDB-as-a-
Service, which is a web service call to provision a dedicated mongoDB server, fully sharded
and replicated, which scales automatically.
You will get a chance to use MongoDB service shortly in our platform
The specific MongoDB architecture that you choose will impact the performance, availability
and data durability. MongoDB is flexible and supports high availability and sharding
architectures to provide you tge level of redundancy, performance and SLA you want for your
service.
MongoDB supports replica sets and sharding deployment architectures. Replica sets provide
high availability and data durability while sharding provides scalability. You can configure
shards on the replica sets for achieving the best of both, reliability and scalability.
This is a replica set with three replica nodes in two datacenters or two regions of a public
cloud.
MongoDB uses “eventual consistency” which means there may be a possibility that data on
the replicas may be out of sync from the primary node. You may want to use this architecture
for data redundancy purposes rather than scaling. In this architecture, you still send reads and
writes to the primary node, which means even with multiple nodes, your application wouldn’t
necessarily scale better. To maintain this level of redundancy yet improve scalability, you can
use sharding as in the next slide.
This is a three shard deployment architecture which uses three replica sets and can be in a
single region or datacenter or distributed geographically.
With this architecture, you get the benefit of both, the data redundancy with replica sets and
high scalability with shards. Each shard itself can be a replica set which provides data
redundancy at each node level. But keep in mind, there is a overhead to sharding and
replication and you want to choose what’s best for your database
Now let’s take a look at a sample application. We have a sample Twitter app to do hands-on
experiment with. We will use MongoDB-as-a-Service on the cloud and use a sample app to
analyze twitter dat.
Just like any database, the performance of MongoDB database must be monitored and
optimized for a given workload or application type.
These are key metrics you want to look for in MongoDb: (1) CPU (2) memory (3) Ops counters
– this is the total number of operations over a period of time. This number shows you number
of active and pending operations (4) background flush – this is the number of disk writes when
MongoDb flushes all in-memory data to the disk. You want to keep an eye on this number and
tweak if you wish to reduce the number of times or frequency of disk writes. There are other
metrics which we will see during our hands-on lab.
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC
Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC

More Related Content

What's hot

What's hot (20)

Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBasePerformance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBase
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
Mongo db a deep dive of mongodb indexes
Mongo db  a deep dive of mongodb indexesMongo db  a deep dive of mongodb indexes
Mongo db a deep dive of mongodb indexes
 
Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
 
No sql - { If and Else }
No sql - { If and Else }No sql - { If and Else }
No sql - { If and Else }
 
Extend db
Extend dbExtend db
Extend db
 
Mongo db report
Mongo db reportMongo db report
Mongo db report
 
MongoDb - Details on the POC
MongoDb - Details on the POCMongoDb - Details on the POC
MongoDb - Details on the POC
 
Mongo db
Mongo dbMongo db
Mongo db
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
No SQL - MongoDB
No SQL - MongoDBNo SQL - MongoDB
No SQL - MongoDB
 
Mongo db workshop # 01
Mongo db workshop # 01Mongo db workshop # 01
Mongo db workshop # 01
 
CMS Mongo DB
CMS Mongo DBCMS Mongo DB
CMS Mongo DB
 
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 

Viewers also liked

Data Migration Between MongoDB and Oracle
Data Migration Between MongoDB and OracleData Migration Between MongoDB and Oracle
Data Migration Between MongoDB and Oracle
ChihYung(Raymond) Wu
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
Steven Francia
 

Viewers also liked (8)

Mongo db manual
Mongo db manualMongo db manual
Mongo db manual
 
Doctrine MongoDB Object Document Mapper
Doctrine MongoDB Object Document MapperDoctrine MongoDB Object Document Mapper
Doctrine MongoDB Object Document Mapper
 
Mongo db crud guide
Mongo db crud guideMongo db crud guide
Mongo db crud guide
 
MONGODB
MONGODBMONGODB
MONGODB
 
Data Migration Between MongoDB and Oracle
Data Migration Between MongoDB and OracleData Migration Between MongoDB and Oracle
Data Migration Between MongoDB and Oracle
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 

Similar to Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC

Everything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptxEverything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptx
75waytechnologies
 
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
ijcsity
 
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
ijcsity
 
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
ijcsity
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data science
bitragowthamkumar1
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
Mat Keep
 
10gen telco white_paper
10gen telco white_paper10gen telco white_paper
10gen telco white_paper
El Taller Web
 

Similar to Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC (20)

how_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptxhow_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptx
 
how_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdfhow_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdf
 
Everything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptxEverything You Need to Know About MongoDB Development.pptx
Everything You Need to Know About MongoDB Development.pptx
 
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
 
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
 
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
MONGODB VS MYSQL: A COMPARATIVE STUDY OF PERFORMANCE IN SUPER MARKET MANAGEME...
 
Mongodb
MongodbMongodb
Mongodb
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
Pros and Cons of MongoDB in Web Development
Pros and Cons of MongoDB in Web DevelopmentPros and Cons of MongoDB in Web Development
Pros and Cons of MongoDB in Web Development
 
MongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data scienceMongoDB Lab Manual (1).pdf used in data science
MongoDB Lab Manual (1).pdf used in data science
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Mongo db dhruba
Mongo db dhrubaMongo db dhruba
Mongo db dhruba
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
 
Mdb dn 2016_11_ops_mgr
Mdb dn 2016_11_ops_mgrMdb dn 2016_11_ops_mgr
Mdb dn 2016_11_ops_mgr
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Introduction to databases (1).pptx
Introduction to databases (1).pptxIntroduction to databases (1).pptx
Introduction to databases (1).pptx
 
10gen telco white_paper
10gen telco white_paper10gen telco white_paper
10gen telco white_paper
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Hands on Big Data Analysis with MongoDB - Cloud Expo Bootcamp NYC

  • 1.
  • 2.
  • 3. We offer MongoDB-as-a-Service on any cloud of your choice. You can read more about our MongoDB-as-a-service in our white paper on our website: http://www.cumulogic.com/ resources/mongodb_wp/
  • 4. The goal of this boot camp is to give you hands-on experience with MongoDB database-as-a- service, how to load the data and show you a sample application to analyze the data. We will use a small sample Twitter application for our hands-on lab, which will help you write a MongoDB application. We will also discuss briefly a few performance-related so you can analyze and tweak performance of your databases. At the same time, you will also see how you can easily launch a fully managed MongoDB instance in the cloud.
  • 5. About a decade ago, business applications were transactional in nature and most of the issues were related to executing transactions (i.e. credit card processing) with low latency, as a result enterprise data was more “relational” in nature and was therefore “structured”. The nature of business applications has changed and enterprises are trying to figure out how to use all the data in their enterprise systems, social media, machine logs, etc. to understand how all the data impacts their business and how they can get competitive advantage by leveraging nuggets in that data. Fast forward till today and businesses are trying to solve a different problem. And with the diverse nature of data sources and data formats, we need newer technologies that scale and provide answers or identify those nuggets in the data at much faster speed and low cost than traditional SQL database or data warehouse systems. Hence, we see a slew of new database technologies being developed that promise to help solving these problems. Depending on the nature of the data or problem they solve, we can categorize these new database technologies in three major categories. (1) Document oriented databases, which store and crunch data in document formats, (2) Key-value pair databases such as Riak and Redis and (3) Graph databases. Depending on the type of data, we could use one of these databases to solve your data analytics problems. Today, we are focus on MongoDB.
  • 6. When should we want to use NoSQL database Vs SQL database, and which NoSQL database? As I mentioned before, the problems that NoSQL databases solve is related to the nature and amount of data we want to processes in our next generation applications. We need databases that can scale to petabytes of data at a fraction of the cost of a relational database. We need database systems which can help us quickly analyze petabytes of data and provide results in realtime - hence the speed and velocity of data access is critical. NoSQL database systems can provide high speed access and low latency access to large amount of data. And one key criteria to consider when choosing NoSQL database is the nature of your applications and main issues with them – are they operational or analytical? For example, for batch processing, analytical apps, you may be better off with Hadoop – while for operational issues of scalability and realtime processing, you may want to choose MongoDB database. So consider these criteria in making your decisions and do some experiments and find the best ones that fits your application needs.
  • 7. 1. Let’s take a look at the key feature sets of MongoDB at very high level. MongoDB is a document oriented database server. It stores objects as BSON (pronounced as bison), which is a binary versions of JSON format and it supports dynamic schemas – which essentially means it is schema-less database. There is no rigid SQL-like schema to store the data. This gives flexibility in choosing the data types from different data sources such as social networks, machine logs or CRM systems. 2. MongoDb supports indexing just like traditional SQL indexing, which means you can index data on any field with high fidelity to improve query performance. (FYI – High fidelity here means the field which is a variable in all records. For example, if we are storing data about employees, the data field that varies most is the phone number and not the city name or company name) 3. Most of you may be familiar with the concept of database sharding. MongoDB is a horizontally scalable database and supports sharding – which means it stores data in smaller chunks on several data nodes for low latency access to the data. Hence MongoDB is widely used in the cloud because you can scale the database by adding shards as your data grows and maintain that low latency of data access even as your size of the data grows. 4. MongoDB is designed to be resilient for data durability and supports replica sets which can be geographically distributed 5. MongoDB supports Map-reduce operations and provides fast updates to the data. FAQs: When do you want to use Hadoop Vs MongoDB for Map-reduce? Answer: You want to use Hadoop for batch jobs, where you can fire up analytics on offline data, whereas you can use MongoDB for realtime data analytics. Question: How does Sharding work in MongoDB? Answer: MongoDB sharding works by spreading writes to multiple data nodes. Mongos, which is the mongoDB proces,s directs data to a different data node to write or read. And show the slide – (refer to the sharding diagram)
  • 8. Since MongoDB scales very well horizontally, it is the most widely used database in the cloud. And given the complexity of managing mongoDB for maintaining availability, data durability and performance, you may want to leverage platforms which provide you MongoDB-as-a- Service, which is a web service call to provision a dedicated mongoDB server, fully sharded and replicated, which scales automatically. You will get a chance to use MongoDB service shortly in our platform
  • 9. The specific MongoDB architecture that you choose will impact the performance, availability and data durability. MongoDB is flexible and supports high availability and sharding architectures to provide you tge level of redundancy, performance and SLA you want for your service. MongoDB supports replica sets and sharding deployment architectures. Replica sets provide high availability and data durability while sharding provides scalability. You can configure shards on the replica sets for achieving the best of both, reliability and scalability.
  • 10. This is a replica set with three replica nodes in two datacenters or two regions of a public cloud. MongoDB uses “eventual consistency” which means there may be a possibility that data on the replicas may be out of sync from the primary node. You may want to use this architecture for data redundancy purposes rather than scaling. In this architecture, you still send reads and writes to the primary node, which means even with multiple nodes, your application wouldn’t necessarily scale better. To maintain this level of redundancy yet improve scalability, you can use sharding as in the next slide.
  • 11. This is a three shard deployment architecture which uses three replica sets and can be in a single region or datacenter or distributed geographically. With this architecture, you get the benefit of both, the data redundancy with replica sets and high scalability with shards. Each shard itself can be a replica set which provides data redundancy at each node level. But keep in mind, there is a overhead to sharding and replication and you want to choose what’s best for your database
  • 12. Now let’s take a look at a sample application. We have a sample Twitter app to do hands-on experiment with. We will use MongoDB-as-a-Service on the cloud and use a sample app to analyze twitter dat.
  • 13.
  • 14. Just like any database, the performance of MongoDB database must be monitored and optimized for a given workload or application type. These are key metrics you want to look for in MongoDb: (1) CPU (2) memory (3) Ops counters – this is the total number of operations over a period of time. This number shows you number of active and pending operations (4) background flush – this is the number of disk writes when MongoDb flushes all in-memory data to the disk. You want to keep an eye on this number and tweak if you wish to reduce the number of times or frequency of disk writes. There are other metrics which we will see during our hands-on lab.