Using $push With $group With Pymongo
Solution 1:
The $project
step is redundant as the $group
pipeline already produces just those three fields so there's no need for a preceding $project
stage.
The correct pipeline should be
pipeline = [
{
"$group": {
"_id": "$user.screen_name",
"tweet_texts": { "$push": "$text" },
"count": { "$sum": 1 }
}
},
{ "$sort" : { "count" : -1 } },
{ "$limit": 5 }
]
Your $project
pipeline didn't work because the previous $group
pipeline doesn't produce any field "$user.screen_name"
which you attempt to use as the _id
field in the $project
pipeline.
However, if you wanted to include the $project
step then the working pipeline should follow:
pipeline = [
{
"$group": {
"_id": "$user.screen_name",
"tweet_texts": { "$push": "$text" },
"count": { "$sum": 1 }
}
},
{ "$project": { "count": 1, "tweet_texts": 1 } },
{ "$sort" : { "count" : -1 } },
{ "$limit": 5 }
]
Solution 2:
Reading comments
Reading the comments I found out that
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
Should in fact be changed to:
pipeline = [
{"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
{"$sort" : {"count" : -1}},
{"$limit": 5}
]
Why?
The full answer and explanation can be seen in the answer:
The conclusion of the story is that I am using the $project
stage wrongly. Not only was is no needed in the first place, to make it idempotent it should be
{"$project": {"_id": "$_id", "count": 1, "tweet_texts": 1}},
I also highly recommend his answer:
Special Thanks
The following users deserve kudos++:
For directing me in to the right path!
Post a Comment for "Using $push With $group With Pymongo"