Skip to content Skip to sidebar Skip to footer

Efficient Way Of Constructing A Matrix Of Pair-wise Distances Between Many Vectors?

First, thanks for reading and taking the time to respond. Second, the question: I have a PxN matrix X where P is in the order of 10^6 and N is in the order of 10^3. So, X is relat

Solution 1:

You can use pdist and squareform from scipy.spatial.distance -

from scipy.spatial.distance import pdist, squareform

out= squareform(pdist(np.sqrt(X)))/np.sqrt(2)

Or use cdist from the same -

from scipy.spatial.distance import cdist

sX = np.sqrt(X)
out= cdist(sX,sX)/np.sqrt(2)

Solution 2:

In addition to Divakar's response, I realized that there is an implementation of this in sklearn which allows parallel processing:

from sklearn.metrics.pairwise import pairwise_distances
njobs = 3
H = pairwise_distances(np.sqrt(X), n_jobs=njobs, metric='euclidean') / math.sqrt(2)

I will do some benchmarking and post the results later.

Post a Comment for "Efficient Way Of Constructing A Matrix Of Pair-wise Distances Between Many Vectors?"