How To Convert Tensorflow Dataset To 2d Numpy Array
Solution 1:
You could try eager execution, previously I gave an answer with session run (showed below).During eager execution using .numpy() on a tensor will convert that tensor to numpy array.Example code (from my use case):
#enable eager executionfrom __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
tf.enable_eager_execution()
print('Is executing eagerly?',tf.executing_eagerly())
#load datasetsimport tensorflow_datasets as tfds
dataset, metadata = tfds.load('cycle_gan/horse2zebra',
with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']
#load dataset in to numpy array
train_A=train_horses.batch(1000).make_one_shot_iterator().get_next()[0].numpy()
print(train_A.shape)
#preview one of the imagesimport matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
print(train_A.shape)
plt.imshow(train_A[1])
plt.show()
Old, session run, answer:
I recently had this problem, and I did it like this:
#load datasetsimport tf
import tensorflow_datasets as tfds
dataset, metadata = tfds.load('cycle_gan/horse2zebra',
with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']
#load dataset in to numpy array
sess = tf.compat.v1.Session()
tra=train_horses.batch(1000).make_one_shot_iterator().get_next()
train_A=np.array(sess.run(tra)[0])
print(train_A.shape)
sess.close()
#preview one of the imagesimport matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
print(train_A.shape)
plt.imshow(train_A[1])
plt.show()
Solution 2:
It doesn't sound like you set up things using the Tensorflow Dataset pipeline, here is the guide for doing so:
https://www.tensorflow.org/programmers_guide/datasets
You can either follow that (it's the right approach, but there's a small learning curve to get used to it), or you can just pass in the numpy array to sess.run
as part of the feed_dict
parameter. If you go this way then you should just create a tf.placeholder
which will be populated by the value in feed_dict
. Many of the basic tutorial examples here follow this approach:
Solution 3:
I was also needing to accomplish this task (Dataset to array), but without turning on eager mode. I managed to come up with the following:
dataset = tf.data.Dataset.from_tensor_slices([[1,2],[3,4]])
tensor_array = tf.TensorArray(dtype=dataset.element_spec.dtype,
size=0,
dynamic_size=True,
element_shape=dataset.element_spec.shape)
tensor_array = dataset.reduce(tensor_array, lambda a, t: a.write(a.size(), t))
tensor = tf.reshape(tensor_array.concat(), (-1,)+tuple(dataset.element_spec.shape))
array = tf.Session().run(tensor)
print(type(array))
# <class 'numpy.ndarray'>
print(array)
# [[1 2]
# [3 4]]
What this does:
We start with a dataset containing 2 tensors of shape (2,)
.
Since eager is off, we need to run the dataset through a Tensorflow session. And since a session requires a tensor, we have to convert the dataset into a tensor.
To accomplish this, we use Dataset.reduce()
to put all the elements into a TensorArray
(symbolically).
We now use TensorArray.concat()
to convert the whole array into a single tensor. However when we do this the whole dataset becomes flattened into a 1-D array. So we need tf.reshape()
to get it back into our original tensor's shape, plus an extra dimension to stack them all.
Finally we take the tensor and run it through a session. This gives us our numpy ndarray.
Solution 4:
This was the simplest method for me for supervised problem with (X, y).
defdataset_to_numpy(ds):
"""
Convert tensorflow dataset to numpy arrays
"""
images = []
labels = []
# Iterate over a datasetfor i, (image, label) inenumerate(tfds.as_numpy(ds)):
images.append(image)
labels.append(label)
for i, img inenumerate(images):
if i < 3:
print(img.shape, labels[i])
return images, labels
Usage:
ds = tfds.load('mnist', split='train', as_supervised=True)
Solution 5:
You can use the following methods to get the images and the corresponding captions:
def separate_dataset(dataset):
images, labels = tf.compat.v1.data.make_one_shot_iterator(dataset.batch(len(dataset))).get_next()
return images, labels
Post a Comment for "How To Convert Tensorflow Dataset To 2d Numpy Array"