Back to MongoDB Expert
Advanced
35 min Read

Aggregation Pipeline

Learning Objectives

  • Pipeline Stages
  • Optimization
  • Lookup and GraphLookup

The Aggregation Pipeline: Data Processing at Scale

The Aggregation Pipeline is a framework for data transformation. Think of it as a factory line where documents pass through multiple "stages" to produce a final result.

Core Stages

  1. $match: Filters documents (Always put this first to reduce data volume).
  2. $group: Groups documents by a key and performs calculations ($sum, $avg).
  3. $project: Reshapes documents (Adding/removing fields).
  4. $sort: Sorts results.
  5. $unwind: Deconstructs an array field into multiple documents.
javascript code
db.orders.aggregate([
  { $match: { status: "A" } },
  { $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } }
])

Performance: The Indexing Rule

Stages like $match and $sort can use indexes if they are at the beginning of the pipeline. Once you use $group or $project, you "lose" the index for subsequent stages.

Advanced Transformations: $lookup and $graphLookup

Use $lookup to perform left outer joins between collections. Use $graphLookup for recursive searches (e.g., finding all descendants in a tree structure).

Pipeline Optimization

  • Projection: Only project fields you absolutely need.
  • Filtering: Match as early and as strictly as possible.
  • Indexes: Ensure the fields in your $match stage are indexed.

Confused about this chapter?

Ask our DevVault AI Assistant for instant clarification!

Ask DevVault AI