Training Outcomes Within Your Budget!

We ensure quality, budget-alignment, and timely delivery by our expert instructors.

Share this Resource
Table of Contents

A Beginner's Guide to MongoDB Aggregate

If you are embarking on the journey of learning MongoDB, you will soon find out about one of its most potent features – Aggregation. The Aggregation operations in MongoDB group values from numerous documents and perform a variety of computations on this grouped data to generate a single result. They provide a much more advanced mechanism for data retrieval, analysis, and manipulation compared to simple find queries. In this blog, we will provide an easy-to-understand introduction to MongoDB Aggregate for beginners. Read more to find out! 

Table of Contents 

1) Getting started with MongoDB Aggregate 

2) Detailed walkthrough on MongoDB Aggregation 

3) Examples of MongoDB Aggregation 

4) Tips for using MongoDB Aggregate 

5) Conclusion 

Getting started with MongoDB Aggregate  

The Aggregate functions in MongoDB work in a Pipeline where your data flows undergo transformations and ultimately produce a refined output. This Pipeline can be adjusted to include as many stages or steps as necessary for the task at hand. Some common stages in the Pipeline include filtering data, sorting it (such as using the Bucket Sort Algorithm), grouping it, and performing transformations on the document data.   

If you're entirely new to MongoDB, the first step would be setting it up on your system. Visit the MongoDB website and download the MongoDB Community Server, which is free to use. Once downloaded, follow the installation instructions as per your Operating System (Windows, macOS, or Linux). If you're planning to integrate MongoDB with Java applications, you may also explore MongoDB Java for seamless connectivity and database operations.  

Next, familiarise yourself with some of the fundamental MongoDB commands. This includes:  

1) Creating a new database: (use DATABASE_NAME) 

2) Creating a new Collection: (db.createCollection('COLLECTION_NAME')) 

3) Inserting documents into a collection: (db.COLLECTION_NAME.insert(DOCUMENT)) 

4) Finding documents in a collection: (db.COLLECTION_NAME.find()). 

Master MongoDB for app and web development – register for our MongoDB Developer Training now! 

Detailed walkthrough on MongoDB Aggregation  

The MongoDB Aggregation Framework is a critical feature that you can use to perform data analysis tasks. In this section, we'll take a detailed look at the Aggregation Pipeline, methods, stages, Operators and more. 

Aggregate method  

The aggregate() method in MongoDB is the primary tool to execute Aggregation operations. This method takes two parameters: 

1) Pipeline: An array of stages, or data transformation steps, to be performed on the data. Each stage in the Pipeline is processed in the order that they appear in the array. 

2) options: An optional parameter that provides additional settings for the Aggregation operation. 

The general syntax is as follows: 

db.collection.aggregate(Pipeline, options) 

For instance, suppose you have a collection named 'orders', and you want to calculate the total price for all the orders. You might use the Aggregate method as follows, which showcases the differences in how MarkLogic vs MongoDB handle data aggregation and processing in their respective environments. 

db.orders.aggregate([ { $group : { _id : null, total : { $sum : "$price" }}}]) 

Pipeline stages  

Each stage in the Pipeline modifies the documents as they pass through. There are several types of stages, and these are the most commonly used: 

1) $match: This stage filters the data by given conditions, similar to the 'WHERE' clause in SQL. 

2) $group: This stage groups the data by a specified identifier. Similar to the 'GROUP BY' clause in SQL. 

3) $sort: This stage sorts the data. 

4) $unwind: This stage deconstructs an array field, outputting one document for each array element. 

Here is a brief example showing how these stages can be used in a MongoDB Pipeline: 
 

db.orders.aggregate([ 

   { $match: { status: "Approved" } }, 

   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }, 

   { $sort: { total: -1 } } 

]) 


This Pipeline does the following: 

1) The $match stage filters the documents to only pass those with the status of "Approved" to the next stage. 

2) The $group stage groups the documents by cust_id and calculates the total amount for each group. 

3) The $sort stage sorts the grouped documents by the total field in descending order. 

Operators 

Operators are used within the MongoDB Pipeline stages to perform operations on the data. Some of the commonly used Aggregation Operators are:
 

Common MongoDB Operators

1) $sum: Calculates the sum of the numeric values. 

2) $avg: Calculates the average of numeric values. 

3) $min: Returns the smallest of the input values. 

4) $max: Returns the largest of the input values. 

5) $count: Returns the count of the number of items in an array. 

Here's an example that uses some of these Operators: 
 

db.orders.aggregate([ 
   { $group:  
       { _id: "$cust_id",  
         totalAmount: { $sum: "$amount" }, 
         count: { $sum: 1 }, 
         avgAmount: { $avg: "$amount" } 
       } 
   }, 
   { $sort: { totalAmount: -1 } } 
]) 


This Pipeline groups the documents by cust_id, calculates the total and average amount for each group, counts the number of documents in each group, and then sorts the groups by the total amount in descending order. 

Understanding the Aggregation Pipeline in MongoDB can unlock powerful data processing capabilities. By combining stages and Operators, you can manipulate and analyse your data in complex and insightful ways. 

Using Expressions in MongoDB Aggregation  

Expressions are used to perform operations on document fields in MongoDB Aggregate functions. These operations could be simple ones, like addition or subtraction, or more complex, such as string manipulation or conditional expressions. MongoDB Expressions are categorised into several types, like:
 

Expressions used in MongoDB Aggregation

1) Accumulators: These are Operators like $sum, $avg, $max, $min and $push used in $group stage. 

2) Arithmetic: These are mathematical Operators like $add, $subtract, $multiply and $divide. 

3) Comparison: These include $eq, $ne, $gt, $gte, $lt, and $lte, among others. 

4) Conditional: These include $cond, $ifNull, $switch, and more. 

5) Date: These include $dayOfMonth, $dayOfWeek, $year, $month, and more for manipulating date objects. 

6) String: These include $concat, $substr, $toLower, $toUpper, and more for manipulating string values. 

Expressions are used within {} in Pipeline stages. For instance, consider the following example where we have a user collection, and we want to find the total number of users over 18.
 

db.users.aggregate([ 
    { $match: { age: { $gt: 18 } } }, 
    { $group: { _id: null, count: { $sum: 1 } } } 
]) 


Here, $gt is a comparison expression that matches documents where the age field is greater than 18. 

Understanding $match, $group and $project with more depth 

Let's delve deeper into some of the most commonly used MongoDB Pipeline stages: $match, $group, and $project. 

$match: 

The $match stage in MongoDB is very similar to the WHERE clause in SQL. It filters the documents so only those that match certain conditions will pass to the next stage. 

For instance, in the following query, $match filters out documents that have an age greater than 18. 

db.users.aggregate([ { $match: { age: { $gt: 18 } } }]) 

$group: 

The $group stage groups input documents by a specified identifier expression and apply the accumulator expressions to each group. 

For example, in the following query, $group is used to group the documents by country field, and for each group, it calculates the total amount. 
 

db.orders.aggregate([ 
    { $group : { _id : "$country", totalAmount: { $sum: "$amount" } } } 
]) 


$project: 

The $project stage passes along the documents with only the specified fields to the following stage in the Pipeline. The specified fields can be existing fields from the input documents or newly computed fields. 

In the following example, $project is used to specify that we only want the name and country fields in the output documents. 
 

db.users.aggregate([  
    { $project : { name : 1, country : 1 } }  
])

 

Utilising Indexes with Aggregation 

Indexes can be used with Aggregation to improve query performance. The $match and $sort Operators can take advantage of an Index when they occur at the very beginning of the MongoDB Pipeline. 

Consider the following example where an Index on age is used with a $match stage: 

 

db.users.createIndex({ age: 1 }); 
db.users.aggregate([ { $match: { age: { $gt: 18 } } }])


This query will be faster if the user's collection has a significant number of documents, as the $match stage can use the Index on the age field to filter documents more quickly. 

Using Aggregation variables 

You can use system variables, user-defined variables and expression variables in the Pipeline stages of MongoDB Aggregation. The $$ syntax is used to reference variables. 

For instance, $$ROOT is a system variable that references the root document. Consider the following example:
 

db.sales.aggregate([ 
   { $project: { item: 1, totalCost: { $multiply: [ "$price", "$quantity" ] } } } 
]) 


In this example, $$ROOT.price refers to the price field in the input document, and $$ROOT.quantity refers to the quantity field in the input document. 

Master your Cloud Database Skills today by signing up for our Amazon DocumentDB with MongoDB Course! 

Examples of MongoDB Aggregation  

Let's take a look at some practical examples of MongoDB Aggregation to better understand how it works in real-world scenarios. For these examples, we'll use a hypothetical 'orders' Collection with documents that look something like this: 

 

{ "_id" : 1, "cust_id" : "A123", "amount" : 500, "status" : "Completed" } 
{ "_id" : 2, "cust_id" : "B234", "amount" : 250, "status" : "Pending" } 
{ "_id" : 3, "cust_id" : "A123", "amount" : 150, "status" : "Completed" } 
{ "_id" : 4, "cust_id" : "B234", "amount" : 200, "status" : "Completed" } 
{ "_id" : 5, "cust_id" : "A123", "amount" : 300, "status" : "Pending" }


Group by customer ID 

Let's say you want to find the total amount for each customer. You can use the $group stage to group documents by the cust_id field and calculate the sum of the amount field: 
 

db.orders.aggregate([ 

   { $group: { _id: "$cust_id", totalAmount: { $sum: "$amount" } } } 

]) 

This Pipeline will return something like this: 

{ "_id" : "B234", "totalAmount" : 450 } 
{ "_id" : "A123", "totalAmount" : 950 } 

 

Filter by order status 

Now, suppose you want to find the total amount for each customer, but only for completed orders. You can add a $match stage to the Pipeline to filter documents: 
 

db.orders.aggregate([ 
   { $match: { status: "Completed" } }, 
   { $group: { _id: "$cust_id", totalAmount: { $sum: "$amount" } } } 
]) 

This Pipeline will return something like this: 

{ "_id" : "B234", "totalAmount" : 200 } 
{ "_id" : "A123", "totalAmount" : 650 }

 

Count orders for each customer 

If you want to count the number of orders for each customer, you can add a count field to the $group stage: 
 

db.orders.aggregate([ 
   { $group: { _id: "$cust_id", count: { $sum: 1 } } } 
]) 

This Pipeline will return something like this: 

{ "_id" : "B234", "count" : 2 } 
{ "_id" : "A123", "count" : 3 } 

 

Calculate average order amount 

To calculate the average order amount for each customer, use the $avg Operator:

 

db.orders.aggregate([ 
   { $group: { _id: "$cust_id", avgAmount: { $avg: "$amount" } } } 
]) 

This Pipeline will return something like this: 

{ "_id" : "B234", "avgAmount" : 225 } 
{ "_id" : "A123", "avgAmount" : 316.6666666666667 }

 

Tips for using MongoDB Aggregate  

Understanding the MongoDB Aggregation Framework and its capabilities is one thing, but applying it efficiently and effectively in your projects requires some practical strategies. Here are some tips to help you use MongoDB Aggregation more effectively:


Tips for using MongoDB Aggregate

Use $match early 

The $match stage is used to filter documents in a MongoDB Collection, allowing only those that match the provided condition to pass through to the next stage in the Pipeline. By using $match early in your Aggregation Pipeline (preferably as the first stage), you can reduce the number of documents that pass through the MongoDB Pipeline, improving the efficiency of the Aggregation operation. 
 

db.orders.aggregate([ 
   { $match: { status: "Approved" } }, 
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }, 
   { $sort: { total: -1 } } 
])


In this example, the $match stage is the first stage in the Pipeline, so only documents with a status of "Approved" are passed to the $group stage. 

Take advantage of $project 

The $project stage can be used to include or exclude fields from documents. Use $project to eliminate unneeded fields, reducing the amount of data that passes through the Pipeline.
 

db.orders.aggregate([ 
   { $match: { status: "Approved" } }, 
   { $project: { cust_id: 1, amount: 1 } }, 
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }, 
   { $sort: { total: -1 } } 
])


In this example, the $project stage ensures that only the cust_id and amount fields are passed to the $group stage, reducing the amount of data that passes through the Pipeline. 

Combine stages when possible 

When possible, combine multiple MongoDB stages into a single stage to improve performance. For instance, if you have multiple $match stages, consider combining them into one. 
 

db.orders.aggregate([ 
   { $match: { $and: [ { status: "Approved" }, { amount: { $gt: 50 } } ] } }, 
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }, 
   { $sort: { total: -1 } } 
])


In this example, we've combined two $match conditions into a single $match stage using the $and Operator. 

Leverage Indexes 

Indexes can significantly speed up the operation of $match and $sort stages when they are at the beginning of the MongoDB Pipeline. Use Indexes on the fields that you often use for these operations. 
 

db.orders.createIndex({ status: 1 }); 
db.orders.aggregate([ 
   { $match: { status: "Approved" } }, 
   { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }, 
   { $sort: { total: -1 } } 
])


In this example, an Index on status will speed up the $match operation. 

Be mindful of the 16MB document limit 

MongoDB imposes a limit of 16MB on the size of a single document. This can be an issue if your $group or $bucket operations produce a single document that exceeds this limit. In such cases, consider if the operation can be broken down into smaller parts or if there's a way to limit the size of the result document. 
 

MongoDB developer

 

Conclusion 

The MongoDB Aggregate Framework is an incredibly powerful tool for data processing and analysis. With its Pipeline and Operator functionalities, you can perform complex computations on your data and retrieve valuable insights. While it might seem complex initially, with consistent practice, you'll find it an invaluable addition to your data manipulation toolkit. 

Unlock your potential in app and web development with our expert App & Web Development Training Courses – sign up now! 

Frequently Asked Questions

Upcoming Programming & DevOps Resources Batches & Dates

Date

building Introduction to HTML

Get A Quote

WHO WILL BE FUNDING THE COURSE?

cross

BIGGEST
NEW YEAR SALE!

WHO WILL BE FUNDING THE COURSE?

close

close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.

close

close

Press esc to close

close close

Back to course information

Thank you for your enquiry!

One of our training experts will be in touch shortly to go overy your training requirements.

close close

Thank you for your enquiry!

One of our training experts will be in touch shortly to go over your training requirements.