Skip to content Skip to sidebar Skip to footer

Convert Numpy Object Array To Sparse Matrix

I would like to convert a numpy array with dtype=object to a sparse array e.g. csr_matrix. However, this fails. x = np.array(['a', 'b', 'c'], dtype=object) csr_matrix(x) # This fa

Solution 1:

It is possible to create a coo format matrix from your x:

In [22]: x = np.array([['a', 'b', 'c']], dtype=object)
In [23]: M=sparse.coo_matrix(x)
In [24]: M
Out[24]: 
<1x3 sparse matrix of type'<class 'numpy.object_'>'with3 stored elements in COOrdinate format>
In [25]: M.data
Out[25]: array(['a', 'b', 'c'], dtype=object)

coo has just flattened the input array and assigned it to its data attribute. (row and col have the indices).

In [31]: M=sparse.coo_matrix(x)
In [32]: print(M)
  (0, 0)    a
  (0, 1)    b
  (0, 2)    c

But displaying it as an array produces an error.

In [26]: M.toarray()
ValueError: unsupported data types in input

Trying to convert it to other formats produces your typeerror.

dok sort of works:

In [28]: M=sparse.dok_matrix(x)
/usr/local/lib/python3.5/dist-packages/scipy/sparse/sputils.py:114: UserWarning: object dtype isnot supported by sparse matrices
  warnings.warn("object dtype is not supported by sparse matrices")
In [29]: M
Out[29]: 
<1x3 sparse matrix of type'<class 'numpy.object_'>'with3 stored elements in Dictionary Of Keys format>

String dtype works a little better, x.astype('U1'), but still has problems with conversion to csr.

Sparse matrices were developed for large linear algebra problems. The ability to do matrix multiplication and linear equation solution were most important. Their application to non-numeric tasks is recent, and incomplete.

Solution 2:

I don't think this is supported and while the documents are a bit sparse on this end, this part of the sources should show that:

# List of the supported data typenums and the corresponding C++ types#T_TYPES = [
    ('NPY_BOOL', 'npy_bool_wrapper'),
    ('NPY_BYTE', 'npy_byte'),
    ('NPY_UBYTE', 'npy_ubyte'),
    ('NPY_SHORT', 'npy_short'),
    ('NPY_USHORT', 'npy_ushort'),
    ('NPY_INT', 'npy_int'),
    ('NPY_UINT', 'npy_uint'),
    ('NPY_LONG', 'npy_long'),
    ('NPY_ULONG', 'npy_ulong'),
    ('NPY_LONGLONG', 'npy_longlong'),
    ('NPY_ULONGLONG', 'npy_ulonglong'),
    ('NPY_FLOAT', 'npy_float'),
    ('NPY_DOUBLE', 'npy_double'),
    ('NPY_LONGDOUBLE', 'npy_longdouble'),
    ('NPY_CFLOAT', 'npy_cfloat_wrapper'),
    ('NPY_CDOUBLE', 'npy_cdouble_wrapper'),
    ('NPY_CLONGDOUBLE', 'npy_clongdouble_wrapper'),
]

Asking for object-based types sounds like a lot. Even some more basic types like float16 are missing.

Post a Comment for "Convert Numpy Object Array To Sparse Matrix"