Master MongoDB aggregation pipelines to build powerful reporting features with vendor-wise, offer-wise, and time-based analytics.
Why Aggregation Pipelines?
When building the reporting system for my finance dashboard, I needed to generate complex reports with multiple groupings, calculations, and filters. MongoDB's aggregation pipeline proved to be the perfect solution.
Understanding the Aggregation Pipeline
Think of aggregation as a series of stages where data flows through, being transformed at each step:
Common Pipeline Stages
- $match: Filter documents (like WHERE in SQL)
- $group: Group by field and calculate aggregates
- $project: Select and transform fields
- $sort: Sort results
- $limit: Limit number of results
- $lookup: Join with other collections
- $unwind: Deconstruct arrays
Real-World Example: Vendor-Wise Report
Requirement
Generate a report showing total spend, number of transactions, and average transaction value per vendor for a date range.
The Pipeline
const vendorReport = await Transaction.aggregate([
// Stage 1: Filter by date range
{
$match: {
date: {
$gte: startDate,
$lte: endDate
},
status: 'completed'
}
},
// Stage 2: Group by vendor
{
$group: {
_id: '$vendorId',
totalSpend: { $sum: '$amount' },
transactionCount: { $sum: 1 },
avgTransaction: { $avg: '$amount' },
maxTransaction: { $max: '$amount' },
minTransaction: { $min: '$amount' }
}
},
// Stage 3: Join with vendor collection
{
$lookup: {
from: 'vendors',
localField: '_id',
foreignField: '_id',
as: 'vendorDetails'
}
},
// Stage 4: Unwind vendor details
{
$unwind: '$vendorDetails'
},
// Stage 5: Project final fields
{
$project: {
vendorId: '$_id',
vendorName: '$vendorDetails.name',
totalSpend: 1,
transactionCount: 1,
avgTransaction: { $round: ['$avgTransaction', 2] },
maxTransaction: 1,
minTransaction: 1
}
},
// Stage 6: Sort by total spend
{
$sort: { totalSpend: -1 }
}
])
Advanced Pattern: Time-Based Analytics
Monthly Revenue Trend
const monthlyTrend = await Transaction.aggregate([
{
$match: {
date: { $gte: new Date(startYear, 0, 1) }
}
},
{
$group: {
_id: {
year: { $year: '$date' },
month: { $month: '$date' }
},
revenue: { $sum: '$amount' },
count: { $sum: 1 }
}
},
{
$sort: { '_id.year': 1, '_id.month': 1 }
},
{
$project: {
_id: 0,
year: '$_id.year',
month: '$_id.month',
revenue: 1,
count: 1,
avgOrderValue: { $divide: ['$revenue', '$count'] }
}
}
])
Complex Reporting: Multi-Level Grouping
Department and Category Breakdown
const departmentReport = await Expense.aggregate([
{
$match: { status: 'approved' }
},
{
$group: {
_id: {
department: '$department',
category: '$category'
},
total: { $sum: '$amount' },
count: { $sum: 1 }
}
},
{
$group: {
_id: '$_id.department',
categories: {
$push: {
name: '$_id.category',
total: '$total',
count: '$count'
}
},
departmentTotal: { $sum: '$total' }
}
},
{
$sort: { departmentTotal: -1 }
}
])
Performance Optimization
Indexing Strategy
- Index fields used in $match stages
- Compound indexes for common queries
- Use $match early in pipeline
- Limit fields with $project
Pipeline Optimization
// Create indexes
db.transactions.createIndex({ date: 1, status: 1 })
db.transactions.createIndex({ vendorId: 1, date: 1 })
// Put $match first to filter early
// Use $project to reduce data size
// Avoid unnecessary $lookup operations
Conditional Aggregations
Using $cond and $switch
const categorizedReport = await Transaction.aggregate([
{
$group: {
_id: '$vendorId',
highValueCount: {
$sum: {
$cond: [{ $gte: ['$amount', 10000] }, 1, 0]
}
},
mediumValueCount: {
$sum: {
$cond: [
{
$and: [
{ $gte: ['$amount', 1000] },
{ $lt: ['$amount', 10000] }
]
},
1,
0
]
}
},
lowValueCount: {
$sum: {
$cond: [{ $lt: ['$amount', 1000] }, 1, 0]
}
},
total: { $sum: '$amount' }
}
}
])
Window Functions (MongoDB 5.0+)
Running Totals and Rankings
const runningTotal = await Transaction.aggregate([
{ $sort: { date: 1 } },
{
$setWindowFields: {
sortBy: { date: 1 },
output: {
cumulativeTotal: {
$sum: '$amount',
window: { documents: ['unbounded', 'current'] }
},
rank: { $rank: {} }
}
}
}
])
Common Pitfalls to Avoid
- Memory Limits: Use $limit and $skip for pagination
- Early Filtering: Always put $match as early as possible
- Array Handling: Remember to $unwind before grouping array elements
- Type Conversion: Use $convert for mixed data types
- Timezone Issues: Use $dateToString with timezone parameter
Real-World Use Cases
Budget Utilization Report
const budgetReport = await Budget.aggregate([
{
$lookup: {
from: 'expenses',
let: { deptId: '$departmentId', budgetYear: '$year' },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$department', '$deptId'] },
{ $eq: [{ $year: '$date' }, '$budgetYear'] }
]
}
}
},
{
$group: {
_id: null,
spent: { $sum: '$amount' }
}
}
],
as: 'spending'
}
},
{
$addFields: {
spent: { $ifNull: [{ $first: '$spending.spent' }, 0] },
remaining: {
$subtract: ['$allocated', { $ifNull: [{ $first: '$spending.spent' }, 0] }]
},
utilizationPercent: {
$multiply: [
{ $divide: [{ $ifNull: [{ $first: '$spending.spent' }, 0] }, '$allocated'] },
100
]
}
}
}
])
Testing Aggregation Pipelines
Development Tips
- Build pipeline stage by stage
- Use MongoDB Compass for visual testing
- Add explain() to check performance
- Test with production-size datasets
Performance Analysis
// Check execution stats
const results = await Transaction.aggregate(pipeline)
.explain('executionStats')
console.log('Execution time:', results.executionStats.executionTimeMillis)
console.log('Documents examined:', results.executionStats.totalDocsExamined)
Results
Using aggregation pipelines in my reporting system achieved:
- 95% faster report generation vs application-level calculations
- Reduced server memory usage by 70%
- Enabled real-time dashboard updates
- Simplified complex business logic
Conclusion
MongoDB aggregation pipelines are incredibly powerful for data analysis and reporting. Master them early in your development process, and you'll build more efficient, scalable applications. Always remember: filter early, aggregate wisely, and index strategically.


