Product Of Array Elements By Group In Numpy (Python)
Solution 1:
If you groups are already sorted (if they are not you can do that with np.argsort
), you can do this using the reduceat
functionality to ufunc
s (if they are not sorted, you would have to sort them first to do it efficiently):
# you could do the group_changes somewhat faster if you care a lot
group_changes = np.concatenate(([0], np.where(groups[:-1] != groups[1:])[0] + 1))
Vprods = np.multiply.reduceat(values, group_changes)
Or mgilson answer if you have few groups. But if you have many groups, then this is much more efficient. Since you avoid boolean indices for every element in the original array for every group. Plus you avoid slicing in a python loop with reduceat.
Of course pandas does these operations conveniently.
Edit: Sorry had prod
in there. The ufunc is multiply
. You can use this method for any binary ufunc
. This means it works for basically all numpy functions that can work element wise on two input arrays. (ie. multiply normally multiplies two arrays elementwise, add adds them, maximum/minimum, etc. etc.)
Solution 2:
First set up a mask for the groups such that you expand the groups in another dimension
mask=(groups==unique(groups).reshape(-1,1))
mask
array([[ True, True, True, False, False, False],
[False, False, False, True, False, False],
[False, False, False, False, True, True]], dtype=bool)
now we multiply with val
mask*val
array([[1, 2, 3, 0, 0, 0],
[0, 0, 0, 4, 0, 0],
[0, 0, 0, 0, 5, 6]])
now you can already do prod along the axis 1 except for those zeros, which is easy to fix:
prod(where(mask*val,mask*val,1),axis=1)
array([ 6, 4, 30])
Solution 3:
As suggested in the comments, you can also use the Pandas module. Using the grouby()
function, this task becomes an one-liner:
import numpy as np
import pandas as pd
values = np.array([1, 2, 3, 4, 5, 6])
groups = np.array([1, 1, 1, 2, 3, 3])
df = pd.DataFrame({'values': values, 'groups': groups})
So df
then looks as follows:
groups values
0 1 1
1 1 2
2 1 3
3 2 4
4 3 5
5 3 6
Now you can groupby()
the groups
column and apply
numpy's prod()
function to each of the groups like this
df.groupby(groups)['values'].apply(np.prod)
which gives you the desired output:
1 6
2 4
3 30
Solution 4:
Well, I doubt this is a great answer, but it's the best I can come up with:
np.array([np.product(values[np.flatnonzero(groups == x)]) for x in np.unique(groups)])
Solution 5:
It's not a numpy solution, but it's fairly readable (I find that sometimes numpy solutions aren't!):
from operator import itemgetter, mul
from itertools import groupby
grouped = groupby(zip(groups, values), itemgetter(0))
groups = [reduce(mul, map(itemgetter(1), vals), 1) for key, vals in grouped]
print groups
# [6, 4, 30]
Post a Comment for "Product Of Array Elements By Group In Numpy (Python)"