Sort Within Group Without Changing Group Order?

June 25, 2023 Post a Comment

Can't seem to find an up-to-date answer on this online. Problem I have is essentially the same as this question, that is, I want to sort by say revenue within group without changin

Solution 1:

You could create a new temporary column that transforms B, A and C to 1, 2 and 3, so that you maintain order of the unordered. Then, just drop the temporary column. In Answer #1, this is more dynamic and will work if the group column values are not grouped together consecutively. For Answer #2, they must be consecutive (the inputs for answer #1 and answer #2 are ordered differently)

Updated Answer #1 (per comment - the groups are not consecutive in rows, but we still want to order them correctly by the order of appearance of the first value within each group.) The code [l for l in enumerate((df['group'].unique()))] will assign a number to each group depending on the order of the first value of the group column in the dataframe.

In[1]:
    name    group   revenue
0   Name1   GroupB  13   Name4   GroupA  44   Name5   GroupA  58   Name7   GroupC  91   Name2   GroupB  22   Name3   GroupB  35   Name6   GroupA  66   Name7   GroupC  77   Name7   GroupC  8

dft = pd.DataFrame([l for l  in enumerate((df['group'].unique()))], columns=['group_number','group'])
df = pd.merge(df, dft, how='left', on='group').sort_values(['group_number', 'revenue'], ascending = [True, False])
df

Out[1]: 
    name   group  revenue  group_number
5  Name3  GroupB        304  Name2  GroupB        200  Name1  GroupB        106  Name6  GroupA        612  Name5  GroupA        511  Name4  GroupA        413  Name7  GroupC        928  Name7  GroupC        827  Name7  GroupC        72

I wanted to highlight the output of dft of the enumerate line of code before the merge and sort.

dft = pd.DataFrame([l for l  in enumerate((df['group'].unique()))], columns=['group_number','group'])
dft

Out[2]: 
   group_number   group00  GroupB
11  GroupA
22  GroupC

Answer #2

import pandas as pd
df = pd.DataFrame({'name': ['Name1','Name2','Name3','Name4','Name5','Name6', 'Name7', 'Name7', 'Name7'], 
    'group':['GroupB','GroupB','GroupB','GroupA','GroupA','GroupA','GroupC','GroupC','GroupC'],'revenue':[1,2,3,4,5,6,7,8,9]})
df['cs'] = (df['group'] != df['group'].shift(1)).cumsum()
df = df.sort_values(['cs', 'revenue'], ascending = [True, False])
df
Out[11]: 
    name   group  revenue  cs
2  Name3  GroupB        3   1
1  Name2  GroupB        2   1
0  Name1  GroupB        1   1
5  Name6  GroupA        6   2
4  Name5  GroupA        5   2
3  Name4  GroupA        4   2
8  Name7  GroupC        9   3
7  Name7  GroupC        8   3
6  Name7  GroupC        7   3

For both answers, then just drop the column:

df = df.drop('cs', axis=1)

Out[12]: 
    name   group  revenue
2  Name3  GroupB        31  Name2  GroupB        20  Name1  GroupB        15  Name6  GroupA        64  Name5  GroupA        53  Name4  GroupA        48  Name7  GroupC        97  Name7  GroupC        86  Name7  GroupC        7

Solution 2:

Why use groupby at all? You could just chain together multiple sort_values calls to get the correct sort order. e.g. using similar data to linked question and you wanted to sort by revenue descending but maintain groups ascending you could do:

import pandas as pd

df = pd.DataFrame({'name': ['Name1','Name2','Name3','Name4','Name5','Name6', 'Name7', 'Name7', 'Name7'], 
    'group':['GroupB','GroupB','GroupB','GroupA','GroupA','GroupA','GroupC','GroupC','GroupC'],'revenue':[1,2,3,4,5,6,7,8,9]})

df.sort_values(by='revenue', ascending= False).sort_values(by='group')

Which would return:

name    group   revenue
5   Name6   GroupA  64   Name5   GroupA  53   Name4   GroupA  42   Name3   GroupB  31   Name2   GroupB  20   Name1   GroupB  18   Name7   GroupC  97   Name7   GroupC  86   Name7   GroupC  7

Python Programming Language

Sort Within Group Without Changing Group Order?

Solution 1:

Solution 2:

Post a Comment for "Sort Within Group Without Changing Group Order?"