Skip to content Skip to sidebar Skip to footer

Load Np.memmap Without Knowing Shape

Is it possible to load a numpy.memmap without knowing the shape and still recover the shape of the data? data = np.arange(12, dtype='float32') data.resize((3,4)) fp = np.memmap(fil

Solution 1:

Not unless that information has been explicitly stored in the file somewhere. As far as np.memmap is concerned, the file is just a flat buffer.

I would recommend using np.save to persist numpy arrays, since this also preserves the metadata specifying their dimensions, dtypes etc. You can also load an .npy file as a memmap by passing the memmap_mode= parameter to np.load.

joblib.dump uses a combination of pickling to store generic Python objects and np.save to store numpy arrays.


To initialize an empty memory-mapped array backed by a .npy file you can use numpy.lib.format.open_memmap:

import numpy as np
from numpy.lib.formatimport open_memmap

# initialize an empty 10TB memory-mapped array
x = open_memmap('/tmp/bigarray.npy', mode='w+', dtype=np.ubyte, shape=(10**13,))

You might be surprised by the fact that this succeeds even if the array is larger than the total available disk space (my laptop only has a 500GB SSD, but I just created a 10TB memmap). This is possible because the file that's created is sparse.

Credit for discovering open_memmap should go to kiyo's previous answer here.

Solution 2:

The answer from @ali_m is perfectly valid. I would like to mention my personal preference, in case it helps anyone. I always begin my memmap arrays with the shape as the first 2 elements. Doing this is as simple as:

# Writing the memmap array
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
fp[:] = data[:]
fp = np.memmap(filename, dtype='float32', mode='r+', shape=(14,))
fp[2:] = fp[:-2]fp[:2] = [3, 4]
del fp

Or simpler still:

# Writing the memmap array
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(14,))
fp[2:] = data[:]fp[:2] = [3, 4]
del fp

Then you can easily read the array as:

#reading the memmap array
newfp = np.memmap(filename, dtype='float32', mode='r')
row_size, col_size = newfp[0:2]
newfp = newfp[2:].reshape((row_size, col_size))

Solution 3:

An alternative to numpy.memmap is tifffile.memmap:

from tifffile import memmap
newArray = memmap("name", shape=(3,3), dtype='uint8')
newArray[1,1] = 11del(newArray)

newArray file is created having values:

0  0  0
0  11 0
0  0  0  

Now lets read it back:

array = memmap("name", dtype='uint8')
print(array.shape) # prints (3,3)print(array)

prints:

0  0  0
0  11 0
0  0  0

Post a Comment for "Load Np.memmap Without Knowing Shape"