SlideShare a Scribd company logo
1 of 16
MongoDB’s New Aggregation Features Chris Westin © Copyright 2010 10gen Inc.
What problem are we solving? Map/Reduce can be used for aggregation… Currently being used for totaling, averaging, etc Map/Reduce is a big hammer Simpler tasks should be easier Shouldn’t need to write JavaScript Avoid the overhead of JavaScript engine We’re seeing requests for help in handling complex documents Select only subdocuments or arrays
How will we solve the problem? Our new aggregation framework Declarative framework No JavaScript required Describe a chain of operations to apply Expression evaluation Return computed values Framework:  we can add new operations easily C++ implementation Higher performance than JavaScript
Aggregation - Pipelines Aggregation requests specify a pipeline A pipeline is a series of operations Conceptually, the members of a collection are passed through a pipeline to produce a result Similar to a command-line pipe
Pipeline Operations $match Uses a query predicate (like .find({…})) as a filter $project Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) This can include computed values $group Aggregates items into buckets defined by a key
Computed Expressions Available in $project operations Prefix expression language Add two fields:  $add:[“$field1”, “$field2”] Provide a value for a missing field: $ifnull:[“$field1”, “$field2”] Nesting:  $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] Other functions…. And we can easily add more as required
Projections $project can reshape results $unwind expression doles out array values one at a time Pull fields from nested documents to the top Push fields from the top down into new virtual documents
Grouping $group aggregation expressions Total of column values:  $sum Average of column values: $avg Collect column values in an array:  $push
Demo (See script at https://gist.github.com/993733)
Usage Tips Use $match in a pipeline as early as possible The query optimizer can then be used to choose an index and avoid scanning the entire collection
Driver Support Initial version is a command For any language, build a JSON database object, and execute the command { aggregate : <collection>, pipeline : {…} } Beware of command result size limit
When is this being released? In final development now Expect to see this in the near future
Sharding support Initial release will support sharding Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
Pipeline Operations – Future Plans $sort Sorts the document stream according to a key $out Saves the document stream to a collection Similar to M/R $out, but with sharded output
Expressions – Future Plans Date field extraction Get year, month, day, hour, etc, from Date Date arithmetic
MongoDB Aggregation MongoSF May 2011

More Related Content

What's hot

Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopHikmat Dhamee
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK hypto
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webAlex Ivanov
 
Building data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyBuilding data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyRoger Barnes
 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious DataBenjamin Shum
 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized viewsZach Cox
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupStartit
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...ForgeRock
 
Get docs from sp doc library
Get docs from sp doc libraryGet docs from sp doc library
Get docs from sp doc librarySudip Sengupta
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316radiocats
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiGrowth Intelligence
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010Karoly Negyesi
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 

What's hot (20)

Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
 
Building data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemyBuilding data flows with Celery and SQLAlchemy
Building data flows with Celery and SQLAlchemy
 
EG Reports - Delicious Data
EG Reports - Delicious DataEG Reports - Delicious Data
EG Reports - Delicious Data
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
 
MongoDB
MongoDBMongoDB
MongoDB
 
Replicating application data into materialized views
Replicating application data into materialized viewsReplicating application data into materialized views
Replicating application data into materialized views
 
9.4json
9.4json9.4json
9.4json
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech MeetupLogstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
Logstash + Elasticsearch + Kibana Presentation on Startit Tech Meetup
 
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
 
Get docs from sp doc library
Get docs from sp doc libraryGet docs from sp doc library
Get docs from sp doc library
 
Mongo db admin_20110316
Mongo db admin_20110316Mongo db admin_20110316
Mongo db admin_20110316
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010MongoDB San Francisco DrupalCon 2010
MongoDB San Francisco DrupalCon 2010
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 

Viewers also liked

Практическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation FrameworkПрактическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation FrameworkДенис Кравченко
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkChris Westin
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2MongoDB
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 

Viewers also liked (11)

Практическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation FrameworkПрактическое применение MongoDB Aggregation Framework
Практическое применение MongoDB Aggregation Framework
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Web Design Trends 2011
Web Design Trends 2011Web Design Trends 2011
Web Design Trends 2011
 
Sharding
ShardingSharding
Sharding
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Grid FS
Grid FSGrid FS
Grid FS
 

Similar to MongoDB Aggregation MongoSF May 2011

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRaghunath A
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05Ankit Dubey
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016Maryann Xue
 
Sedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterSedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterIvan Shcheklein
 
Software development - the java perspective
Software development - the java perspectiveSoftware development - the java perspective
Software development - the java perspectiveAlin Pandichi
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBLisa Roth, PMP
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalystdwm042
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 

Similar to MongoDB Aggregation MongoSF May 2011 (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016phoenix-on-calcite-hadoop-summit-2016
phoenix-on-calcite-hadoop-summit-2016
 
9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT9-Query Processing-05-06-2023.PPT
9-Query Processing-05-06-2023.PPT
 
Sedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing RewriterSedna XML Database: Query Parser & Optimizing Rewriter
Sedna XML Database: Query Parser & Optimizing Rewriter
 
cheat-sheets.pdf
cheat-sheets.pdfcheat-sheets.pdf
cheat-sheets.pdf
 
Software development - the java perspective
Software development - the java perspectiveSoftware development - the java perspective
Software development - the java perspective
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDBMongoDB .local London 2019: Fast Machine Learning Development with MongoDB
MongoDB .local London 2019: Fast Machine Learning Development with MongoDB
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 

More from Chris Westin

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productionengChris Westin
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalChris Westin
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcacheChris Westin
 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspeChris Westin
 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspeChris Westin
 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Chris Westin
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012Chris Westin
 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed TechnologyChris Westin
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica setsChris Westin
 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionChris Westin
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011Chris Westin
 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011Chris Westin
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011Chris Westin
 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2Chris Westin
 
Mysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooMysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooChris Westin
 
Mysql proxy presentation_yahoo
Mysql proxy presentation_yahooMysql proxy presentation_yahoo
Mysql proxy presentation_yahooChris Westin
 

More from Chris Westin (20)

Data torrent meetup-productioneng
Data torrent meetup-productionengData torrent meetup-productioneng
Data torrent meetup-productioneng
 
Gripshort
GripshortGripshort
Gripshort
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Building low latency java applications with ehcache
Building low latency java applications with ehcacheBuilding low latency java applications with ehcache
Building low latency java applications with ehcache
 
SDN/OpenFlow #lspe
SDN/OpenFlow #lspeSDN/OpenFlow #lspe
SDN/OpenFlow #lspe
 
cfengine3 at #lspe
cfengine3 at #lspecfengine3 at #lspe
cfengine3 at #lspe
 
Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19Nimbula lspe-2012-04-19
Nimbula lspe-2012-04-19
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
 
Stingray - Riverbed Technology
Stingray - Riverbed TechnologyStingray - Riverbed Technology
Stingray - Riverbed Technology
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
 
Architecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage SolutionArchitecting a Scale Out Cloud Storage Solution
Architecting a Scale Out Cloud Storage Solution
 
FlashCache
FlashCacheFlashCache
FlashCache
 
Large Scale Cacti
Large Scale CactiLarge Scale Cacti
Large Scale Cacti
 
MongoDB: An Introduction - July 2011
MongoDB:  An Introduction - July 2011MongoDB:  An Introduction - July 2011
MongoDB: An Introduction - July 2011
 
Practical Replication June-2011
Practical Replication June-2011Practical Replication June-2011
Practical Replication June-2011
 
MongoDB: An Introduction - june-2011
MongoDB:  An Introduction - june-2011MongoDB:  An Introduction - june-2011
MongoDB: An Introduction - june-2011
 
Ganglia Overview-v2
Ganglia Overview-v2Ganglia Overview-v2
Ganglia Overview-v2
 
Mysql Proxy Presentation Yahoo
Mysql Proxy Presentation YahooMysql Proxy Presentation Yahoo
Mysql Proxy Presentation Yahoo
 
Mysql proxy presentation_yahoo
Mysql proxy presentation_yahooMysql proxy presentation_yahoo
Mysql proxy presentation_yahoo
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

MongoDB Aggregation MongoSF May 2011

  • 1. MongoDB’s New Aggregation Features Chris Westin © Copyright 2010 10gen Inc.
  • 2. What problem are we solving? Map/Reduce can be used for aggregation… Currently being used for totaling, averaging, etc Map/Reduce is a big hammer Simpler tasks should be easier Shouldn’t need to write JavaScript Avoid the overhead of JavaScript engine We’re seeing requests for help in handling complex documents Select only subdocuments or arrays
  • 3. How will we solve the problem? Our new aggregation framework Declarative framework No JavaScript required Describe a chain of operations to apply Expression evaluation Return computed values Framework: we can add new operations easily C++ implementation Higher performance than JavaScript
  • 4. Aggregation - Pipelines Aggregation requests specify a pipeline A pipeline is a series of operations Conceptually, the members of a collection are passed through a pipeline to produce a result Similar to a command-line pipe
  • 5. Pipeline Operations $match Uses a query predicate (like .find({…})) as a filter $project Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) This can include computed values $group Aggregates items into buckets defined by a key
  • 6. Computed Expressions Available in $project operations Prefix expression language Add two fields: $add:[“$field1”, “$field2”] Provide a value for a missing field: $ifnull:[“$field1”, “$field2”] Nesting: $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] Other functions…. And we can easily add more as required
  • 7. Projections $project can reshape results $unwind expression doles out array values one at a time Pull fields from nested documents to the top Push fields from the top down into new virtual documents
  • 8. Grouping $group aggregation expressions Total of column values: $sum Average of column values: $avg Collect column values in an array: $push
  • 9. Demo (See script at https://gist.github.com/993733)
  • 10. Usage Tips Use $match in a pipeline as early as possible The query optimizer can then be used to choose an index and avoid scanning the entire collection
  • 11. Driver Support Initial version is a command For any language, build a JSON database object, and execute the command { aggregate : <collection>, pipeline : {…} } Beware of command result size limit
  • 12. When is this being released? In final development now Expect to see this in the near future
  • 13. Sharding support Initial release will support sharding Mongos analyzes pipeline, and forwards operations up to $group to shards; combines shard server results and continues
  • 14. Pipeline Operations – Future Plans $sort Sorts the document stream according to a key $out Saves the document stream to a collection Similar to M/R $out, but with sharded output
  • 15. Expressions – Future Plans Date field extraction Get year, month, day, hour, etc, from Date Date arithmetic