3.39. MongoDB Map Reduce

发布时间 :2025-10-25 12:32:57 UTC      

Map-Reduce is a computing model that simply breaks down (MAP) a large number of work (data) and then merges the results into the final result (REDUCE).

The Map-Reduce provided by MongoDB is very flexible and useful for large-scale data analysis.

3.39.1. MapReduce command

The following is the basic syntax of MapReduce:

>db.collection.mapReduce(
   function() {emit(key,value);},  //map 函数
   function(key,values) {return reduceFunction},   //reduce 函数
   {
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)

Using MapReduce to implement two functions, the Map function and the Reduce function, the Map function calls emit (key, value), traverses all the records in the collection, and passes key and value to the Reduce function for processing.

The Map function must call emit (key, value) to return a key-value pair.

Parameter description:

  • map Mapping function (generates a sequence of key-value pairs as an argument to the reduce function)

  • reduce Statistics function, the task of the reduce function is to turn key-values into key-value, that is, to turn the values array into a single value value.

  • out The statistical results store the collection (if not specified, a temporary collection is used, which is automatically deleted when the client is disconnected).

  • query A filter condition in which only documents that meet the criteria call the map function. (query. Limit,sort can be combined at will)

  • sort The sort sorting parameter combined with limit (also sorting documents before sending to the map function) can optimize the grouping mechanism.

  • limit The upper limit on the number of documents sent to the map function (without limit, it is not useful to use sort alone)

The following example looks for data for status: “A” in the collection orders, and according to the cust_id To group and calculate the sum of the amount.

Image0

Use MapReduce

Consider the following document structure to store the user’s article, where the document stores the user’s user_name and the article’s status field:

>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "mark",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "mark",
   "status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "runoob",
   "status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "runoob",
   "status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
   "post_text": "菜鸟教程,最全的技术文档。",
   "user_name": "runoob",
   "status":"active"
})
WriteResult({ "nInserted" : 1 })

Now we will use the mapReduce function in the posts collection to select the published article (status: “active”) and use the user_name Group to calculate the number of articles per user:

>db.posts.mapReduce(
   function() { emit(this.user_name,1); },
   function(key, values) {return Array.sum(values)},
      {
         query:{status:"active"},
         out:"post_total"
      }
)

The above mapReduce output is as follows:

{
        "result" : "post_total",
        "timeMillis" : 23,
        "counts" : {
                "input" : 5,
                "emit" : 5,
                "reduce" : 1,
                "output" : 2
        },
        "ok" : 1
}

The results show that there are five documents that meet the query criteria (status: “active”), and five key-value pairs are generated in the map function. Finally, the same key-value is divided into two groups by using the reduce function.

  • Result: the name of the collection that stores the results. This is a temporary collection. The connection to the MapReduce is automatically deleted when the connection is closed.

  • TimeMillis: time taken to execute in milliseconds

  • Input: the number of documents sent to the map function that meet the condition

  • Emit: the number of times emit is called in the map function, that is, the total amount of data in all collections

  • Output: the number of documents in the result set (count is very helpful for debugging)

  • Ok: whether it is successful or not, success is 1

  • Err: if you fail, there can be reasons for failure, but from an empirical point of view, the reasons are vague and of little use.

Use find Operator to view mapReduce Query results of:

> var map=function() { emit(this.user_name,1); }
> var reduce=function(key, values) {return Array.sum(values)}
> var options={query:{status:"active"},out:"post_total"}
> db.posts.mapReduce(map,reduce,options)
{ "result" : "post_total", "ok" : 1 }
> db.post_total.find();

The above query shows the following results:

{ "_id" : "mark", "value" : 4 }
{ "_id" : "runoob", "value" : 1 }

In a similar manner, MapReduce can be used to build large and complex aggregate queries.

The Map function and Reduce function can be implemented using JavaScript, which makes the use of MapReduce very flexible and powerful.

Principles, Technologies, and Methods of Geographic Information Systems  102

In recent years, Geographic Information Systems (GIS) have undergone rapid development in both theoretical and practical dimensions. GIS has been widely applied for modeling and decision-making support across various fields such as urban management, regional planning, and environmental remediation, establishing geographic information as a vital component of the information era. The introduction of the “Digital Earth” concept has further accelerated the advancement of GIS, which serves as its technical foundation. Concurrently, scholars have been dedicated to theoretical research in areas like spatial cognition, spatial data uncertainty, and the formalization of spatial relationships. This reflects the dual nature of GIS as both an applied technology and an academic discipline, with the two aspects forming a mutually reinforcing cycle of progress.