Skip to content Skip to sidebar Skip to footer

Tensorflow Dense Tensor To Sparse Binarized Hash Trick Tensor

I want to transform this dataset in such a way that each tensor has a given size n and that a feature at index i of this new tensor is set to 1 if and only if there is a i in the o

Solution 1:

Here is a possible implementation for that:

import tensorflow as tf

defbinarization_sparse(t, n):
    # Input size
    t_shape = tf.shape(t)
    t_rows = t_shape[0]
    t_cols = t_shape[1]
    # Make sparse row indices for each value
    row_idx = tf.tile(tf.range(t_rows)[: ,tf.newaxis], [1, t_cols])
    # Sparse column indices
    col_idx = t % n
    # "Flat" indices - needed to discard repetitions
    total_idx = row_idx * n + col_idx
    # Remove repeated elements
    out_idx, _ = tf.unique(tf.reshape(total_idx, [-1]))
    # Back to row and column indices
    sparse_idx = tf.stack([out_idx // n, out_idx % n], axis=-1)
    # Sparse values
    sparse_values = tf.ones([tf.shape(sparse_idx)[0]], dtype=t.dtype)
    # Make sparse tensor
    out = tf.sparse.SparseTensor(tf.cast(sparse_idx, tf.int64),
                                 sparse_values,
                                 [t_rows, n])
    # Reorder indices
    out = tf.sparse.reorder(out)
    return out

# Testwith tf.Graph().as_default(), tf.Session() as sess:
    t = tf.constant([
        [ 0,  3,  4],
        [12,  2,  4]
    ])
    # Sparse result
    t_m1h_sp = binarization_sparse(t, 9)
    # Convert to dense to check output
    t_m1h = tf.sparse.to_dense(t_m1h_sp)
    print(sess.run(t_m1h))

Output:

[[1 0 0 1 1 0 0 0 0]
 [0 0 1 1 1 0 0 0 0]]

I added the logic to remove repeated elements because in principle it could happen, but if you have a guarantee that there are no repetitions (including modulo), you may skip that step. Also, I reorder the sparse tensor at the end. That is not strictly necessary here, but (I think) sparse operations sometimes expect the indices to be ordered (and sparse_idx may not be ordered).

Also, this solution is specific to 2D inputs. For 1D inputs would be simpler, and it can be written for higher-dimensional inputs as well if necessary. I think a completely general solution is possible but it would be more complicated (specially if you want to consider tensors with unknown number of dimensions).

Post a Comment for "Tensorflow Dense Tensor To Sparse Binarized Hash Trick Tensor"