Map-Reduce is a computing model that simply breaks down (MAP) a large number of work (data) and then merges the results into the final result (REDUCE).
The Map-Reduce provided by MongoDB is very flexible and useful for large-scale data analysis. The following is the basic syntax of MapReduce: Using MapReduce to implement two functions, the Map function and the Reduce function, the Map function calls emit (key, value), traverses all the records in the collection, and passes key and value to the Reduce function for processing. The Map function must call emit (key, value) to return a key-value pair. Parameter description: map Mapping function (generates a sequence of key-value pairs as an argument to the reduce function) reduce Statistics function, the task of the reduce function is to turn key-values into key-value, that is, to turn the values array into a single value value. out The statistical results store the collection (if not specified, a temporary collection is used, which is automatically deleted when the client is disconnected). query A filter condition in which only documents that meet the criteria call the map function. (query. Limit,sort can be combined at will) sort The sort sorting parameter combined with limit (also sorting documents before sending to the map function) can optimize the grouping mechanism. limit The upper limit on the number of documents sent to the map function (without limit, it is not useful to use sort alone) The following example looks for data for status: “A” in the collection orders, and according to the Consider the following document structure to store the user’s article, where the document stores the user’s user_name and the article’s status field: Now we will use the mapReduce function in the posts collection to select the published article (status: “active”) and use the The above mapReduce output is as follows: The results show that there are five documents that meet the query criteria (status: “active”), and five key-value pairs are generated in the map function. Finally, the same key-value is divided into two groups by using the reduce function. Result: the name of the collection that stores the results. This is a temporary collection. The connection to the MapReduce is automatically deleted when the connection is closed. TimeMillis: time taken to execute in milliseconds Input: the number of documents sent to the map function that meet the condition Emit: the number of times emit is called in the map function, that is, the total amount of data in all collections Output: the number of documents in the result set (count is very helpful for debugging) Ok: whether it is successful or not, success is 1 Err: if you fail, there can be reasons for failure, but from an empirical point of view, the reasons are vague and of little use. Use The above query shows the following results: In a similar manner, MapReduce can be used to build large and complex aggregate queries. The Map function and Reduce function can be implemented using JavaScript, which makes the use of MapReduce very flexible and powerful. 3.39.1. MapReduce command ¶
>db.collection.mapReduce(
function() {emit(key,value);}, //map 函数
function(key,values) {return reduceFunction}, //reduce 函数
{
out: collection,
query: document,
sort: document,
limit: number
}
)
cust_id
To group and calculate the sum of the amount.Use MapReduce ¶
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "mark",
"status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "mark",
"status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "mark",
"status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "mark",
"status":"active"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "mark",
"status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "runoob",
"status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "runoob",
"status":"disabled"
})
WriteResult({ "nInserted" : 1 })
>db.posts.insert({
"post_text": "菜鸟教程,最全的技术文档。",
"user_name": "runoob",
"status":"active"
})
WriteResult({ "nInserted" : 1 })
user_name
Group to calculate the number of articles per user:>db.posts.mapReduce(
function() { emit(this.user_name,1); },
function(key, values) {return Array.sum(values)},
{
query:{status:"active"},
out:"post_total"
}
)
{
"result" : "post_total",
"timeMillis" : 23,
"counts" : {
"input" : 5,
"emit" : 5,
"reduce" : 1,
"output" : 2
},
"ok" : 1
}
find
Operator to view
mapReduce
Query results of:> var map=function() { emit(this.user_name,1); }
> var reduce=function(key, values) {return Array.sum(values)}
> var options={query:{status:"active"},out:"post_total"}
> db.posts.mapReduce(map,reduce,options)
{ "result" : "post_total", "ok" : 1 }
> db.post_total.find();
{ "_id" : "mark", "value" : 4 }
{ "_id" : "runoob", "value" : 1 }