3.22. MongoDB aggregation

Aggregate in MongoDB is mainly used to process data (such as statistical average, summation, etc.) and return the calculated data results.

It’s kind of similar. SQL Count (*) in the statement.

3.22.1. `aggregate()` Method ¶

The aggregate method in MongoDB uses the aggregate() .

Grammar ¶

aggregate() The basic syntax format of the method is as follows:

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Example ¶

The data in the collection is as follows:

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview',
   description: 'MongoDB is no sql database',
   by_user: 'runoob.com',
   url: 'http://www.runoob.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview',
   description: 'No sql database is very fast',
   by_user: 'runoob.com',
   url: 'http://www.runoob.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview',
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

Now we use the above collection to calculate the number of articles written by each author, and the result using aggregate () is as follows:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
   "result" : [
      {
         "_id" : "runoob.com",
         "num_tutorial" : 2
      },
      {
         "_id" : "Neo4j",
         "num_tutorial" : 1
      }
   ],
   "ok" : 1
}
>

The above example is similar to the sql statement:

select by_user, count(*) from mycol group by by_user

In the above example, we group the data by the field by_user field and calculate the sum of the same values in the by_user field.

The following table shows some aggregate expressions:

Expression.	Description	Example
$sum	Calculate the sum.	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$sum : “$likes”}}}] )
$avg	Calculate the average	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$avg : “$likes”}}}] )
$min	Gets the minimum value for all documents in the collection.	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$min : “$likes”}}}] )
$max	Gets the maximum value for all documents in the collection.	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, num_tutorial : {$max : “$likes”}}}] )
$push	Adding values to an array does not determine whether there are duplicate values.	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, url : {$push: “$url”}}}] )
$addToSet	Adding a value to an array determines whether there is a duplicate value, but does not add the same value if it already exists in the array.	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, url : {$addToSet : “$url”}}}] )
$first	The first document data is obtained according to the sorting of resource documents.	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, first_url : {$first : “$url”}}}] )
$last	Get the last document data according to the sorting of resource documents	Db.mycol.aggregate ( [{$group : {_id : “$by_user”, last_url : {$last : “$url”}}}] )

3.22.2. The concept of pipeline ¶

Pipes are commonly used in Unix and Linux to use the output of the current command as a parameter to the next command.

MongoDB’s aggregation pipeline passes the results of the MongoDB document to the next after one pipe has finished processing. Pipe operations can be repeated.

Expression: processes the input document and outputs it. Expressions are stateless and can only be used to evaluate documents for the current aggregation pipeline, not other documents.

Here we introduce several operations commonly used in the aggregation framework:

$project: modify the structure of the input document. Can be used to rename, add, or delete fields, or to create calculation results and nested documents.
Match: used to filter data and output only documents that meet the criteria. Match uses MongoDB’s standard query operation.
Limit: used to limit the number of documents returned by the MongoDB aggregation pipeline.
Skip: skips the specified number of documents in the aggregation pipeline and returns the remaining documents.
Unwind: splits an array type field in a document into multiple strips, each containing a value in the array.
$group: groups the documents in the collection and can be used to count the results.
$sort: sort the input documents and output them.
$geoNear: outputs ordered documents close to a geographic location.

Pipe operator instance ¶

1、$project实例

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
 );

In this way, there are only three fields,_ id,tilte and author, in the result. By default, the_ id field is included. If you want not to include_ id, you can do this:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }});

2.$match实例

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: null, count: { $sum: 1 } } }
                       ] );

$match is used to get records with scores greater than 70, less than or equal to 90, and then send the eligible records to the next stage $group pipeline operator for processing.

3.$skip实例

db.article.aggregate(
    { $skip : 5 });

After being processed by the $skip pipeline operator, the first five documents are “filtered” out.