Skip to content Skip to sidebar Skip to footer

Using $push With $group With Pymongo

Objective Fix my make_pipeline() function to, using an aggregation query, count the number of tweets for each user, add them to an array and return the 5 users with the most tweets

Solution 1:

The $project step is redundant as the $group pipeline already produces just those three fields so there's no need for a preceding $project stage.

The correct pipeline should be

pipeline = [ 
    {
        "$group": {
            "_id": "$user.screen_name", 
            "tweet_texts": { "$push": "$text" }, 
            "count": { "$sum": 1 }
        }
    }, 
    { "$sort" : { "count" : -1 } }, 
    { "$limit": 5 } 
] 

Your $project pipeline didn't work because the previous $group pipeline doesn't produce any field "$user.screen_name" which you attempt to use as the _id field in the $project pipeline.

However, if you wanted to include the $project step then the working pipeline should follow:

pipeline = [ 
    {
        "$group": {
            "_id": "$user.screen_name", 
            "tweet_texts": { "$push": "$text" }, 
            "count": { "$sum": 1 }
        }
    }, 
    { "$project": { "count": 1, "tweet_texts": 1 } },
    { "$sort" : { "count" : -1 } }, 
    { "$limit": 5 } 
] 

Solution 2:

Reading comments

Reading the comments I found out that

pipeline = [
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}},
        {"$project": {"_id": "$user.screen_name", "count": 1, "tweet_texts": 1}},
        {"$sort" : {"count" : -1}},
        {"$limit": 5}
    ]

Should in fact be changed to:

pipeline = [ 
        {"$group": {"_id": "$user.screen_name", "tweet_texts": {"$push": "$text"}, "count": {"$sum": 1}}}, 
        {"$sort" : {"count" : -1}}, 
        {"$limit": 5}
    ]

Why?

The full answer and explanation can be seen in the answer:

The conclusion of the story is that I am using the $project stage wrongly. Not only was is no needed in the first place, to make it idempotent it should be

{"$project": {"_id": "$_id", "count": 1, "tweet_texts": 1}},

I also highly recommend his answer:

Special Thanks

The following users deserve kudos++:

For directing me in to the right path!

Post a Comment for "Using $push With $group With Pymongo"