How To Create A List Of Random Integer Vector Whose Sum Is X
Solution 1:
I would suggest not doing this recursively:
When you sample recursively, the value from the first index has a much greater possible range, whereas values in subsequent indices will be constrained by the first value. This will yield something resembling an exponential distribution.
Instead, what I'd recommend is sampling from the multinomial distribution. This will treat each index equally, constrain the sum, force all values to be integers, and sample uniformly from all possible configurations that follow these rules (note: configurations that can happen multiple ways will be weighted by the number of ways that they can occur).
To help merge your question with the multinomial notation, total sum is n (an integer), and so each of the k values (one for each index, also integers) must be between 0 and n. Then follow the recipe here.
(Or use numpy.random.multinomial as @Dougal helpfully suggested).
Solution 2:
I just ran both @Oliver's multinomial approach and @mgilson's code a million times each, for a length-3 vector summing to 10, and looked at the number of times each possible outcome came up. Both are extremely nonuniform:
(I'm about to show the indexing approach.)
Does this matter? Depends on whether you want "an arbitrary vector with this property that's usually different each time" vs each valid vector being equally likely.
In the multinomial approach, of course 3 3 4
is going to be much more likely than 0 0 10
(4200 times more likely, as it turns out). mgilson's biases are less obvious to me, but 0 0 10
and its permutations were the least likely by far (only ~750 times each out of a million); the most common were 1 4 5
and its permutations; not sure why, but they were certainly the most common, followed by 1 3 6
. It'll typically start with a sum that's too high in this configuration (expectation 15), though I'm not sure why the reduction works out that way....
One way to get a uniform output over the possible vectors would be a rejection scheme. To get a vector of length K
with sum N
, you'd:
- Sample a vector of length
K
with integer elements uniformly and independently between0
andN
. - Repeat until the sum of the vector is
N
.
Obviously this is going to be extremely slow for non-tiny K
and N
.
Another approach would be to assign a numbering to all the possible vectors; there are (N + K - 1) choose (K - 1)
such vectors, so just choose a random integer in that range to decide which one you want. One reasonable way to number them is lexicographic ordering: (0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), ...
.
Note that the last (K
th) element of the vector is uniquely determined by the sum of the first K-1
.
I'm sure there's a nice way to immediately jump to whatever index in this list, but I can't think of it right now....enumerating the possible outcomes and walking over them will work, but will probably be slower than necessary. Here's some code for that (though we actually use reverse lexicographic ordering here...).
from itertools import islice, combinations_with_replacement
from functools import reduce
from math import factorial
from operator import mul
import random
def_enum_cands(total, length):
# get all possible ways of choosing 10 of our indices# for example, the first one might be 0000000000# meaning we picked index 0 ten times, for [10, 0, 0]for t in combinations_with_replacement(range(length), 10):
cand = [0] * length
for i in t:
cand[i] += 1yieldtuple(cand)
defint_vec_with_sum(total, length):
num_outcomes = reduce(mul, range(total + 1, total + length)) // factorial(length - 1)
# that's integer division, even though SO thinks it's a comment :)
idx = random.choice(range(num_outcomes))
returnnext(islice(_enum_cands(total, length), idx, None))
As shown in the histogram above, this is actually uniform over possible outcomes. It's also easily adaptable to upper/lower bounds on any individual element; just add the condition to _enum_cands
.
This is slower than either of the other answers: for sum 10 length 3, I get
- 14.7 us using
np.random.multinomial
, - 33.9 us using mgilson's,
- 88.1 us with this approach
I'd expect that the difference would get worse as the number of possible outcomes increases.
If someone comes up with a nifty formula for indexing into these vectors somehow, it'd be much better....
Solution 3:
Here's a pretty straight forward implementation.
import random
import math
defrandvec(vecsum, N, maxval, minval):
if N*minval > vecsum or N*maxval < vecsum:
raise ValueError ('Cannot create desired vector')
indices = list(range(N))
vec = [random.randint(minval,maxval) for i in indices]
diff = sum(vec) - vecsum # we were off by this amount.#Iterate through, incrementing/decrementing a random index #by 1 for each value we were off.while diff != 0:
addthis = 1if diff > 0else -1# +/- 1 depending on if we were above or below target.
diff -= addthis
### IMPLEMENTATION 1 ###
idx = random.choice(indices) # Pick a random index to modify, check if it's OK to modifywhilenot (minval < (vec[idx] - addthis) < maxval): #operator chaining. If you don't know it, look it up. It's pretty cool.
idx = random.choice(indices) #Not OK to modify. Pick another.
vec[idx] -= addthis #Update that index.### IMPLEMENTATION 2 #### random.shuffle(indices)# for idx in indices:# if minval < (vec[idx] - addthis) < maxval:# vec[idx]-=addthis# break## in situations where (based on choices of N, minval, maxval and vecsum)# many of the values in vec MUST BE minval or maxval, Implementation 2# may be superior.return vec
a = randvec(1000,20,100,1)
printsum(a)
Solution 4:
The most efficient way to sample uniformly from the set of partitions of N elements into K bins is to use a dynamic programming algorithm, which is O(KN). There are a multichoose (http://mathworld.wolfram.com/Multichoose.html) number of possibilities, so enumerating every one will be very slow. Rejection sampling and other monte-carlo methods will also likely be very slow.
Other methods people propose, like sampling from a multinomial do not draw samples from a uniform distribution.
Let T(n,k) be the number of partitions of n elements into k bins, then we can compute the recurrence
T(n,1)=1\forall n>=0T(n,k)=\sum_{m<=n}T(n-m,k-1)
To sample K elements that sum to N, sample from K multinomial distributions going "backward" in the recurrence: Edit: The T's in the multinomial's below should be normalized to sum to one before drawing each sample.
n1 = multinomial([T(N,K-1),T(N-1,K-1),...,T(0,K-1)])
n2 = multinomial([T(N-n1,K-1),T(N-n1-1,K-1),...,T(0,K-1)])
...
nK = multinomial([T(N-sum([n1,...,n{k-1}]),1),T(N-sum([n1,...,n{k-1}])-1,1),...,T(0,1)])
Note: I am allowing 0's to be sampled.
This procedure is similar to sampling a set of hidden state from a segmental semi-markov model (http://www.gatsby.ucl.ac.uk/%7Echuwei/paper/icml103.pdf).
Solution 5:
Just to give you another approach, implement a partition_function(X)
and randomly choose a number between 0 and the length of partition_function(1000)
and there you have it. Now you just need to find an efficient way to calculate a partition function. These links might help:
http://code.activestate.com/recipes/218332-generator-for-integer-partitions/
EDIT:Here is a simple code:
import itertools
import random
all_partitions = {0:set([(0,)]),1:set([(1,)])}
defpartition_merge(a,b):
c = set()
for t in itertools.product(a,b):
c.add(tuple(sorted(list(t[0]+t[1]))))
return c
defmy_partition(n):
if all_partitions.has_key(n):
return all_partitions[n]
a = set([(n,)])
for i in xrange(1,n/2+1):
a = partition_merge(my_partition(i),my_partition(n-i)).union(a)
all_partitions[n] = a
return a
if __name__ == '__main__':
n = 30# if you have a few years to wait uncomment the next line# n = 1000
a = my_partition(n)
i = random.randint(0,len(a)-1)
print(list(a)[i])
Post a Comment for "How To Create A List Of Random Integer Vector Whose Sum Is X"