Skip to content Skip to sidebar Skip to footer

How To Use Numpy.save In Append Mode

I use numpy.save and numpy.load to R/W large datasets in my project. I realized that that numpy.save does not apply append mode. For instance (Python 3): import numpy as np n = 5

Solution 1:

The npy file format doesn't work that way. An npy file encodes a single array, with a header specifying shape, dtype, and other metadata. You can see the npy file format spec in the NumPy docs.

Support for appending data was not a design goal of the npy format. Even if you managed to get numpy.save to append to an existing file instead of overwriting the contents, the result wouldn't be a valid npy file. Producing a valid npy file with additional data would require rewriting the header, and since this could require resizing the header, it could shift the data and require the whole file to be rewritten.

NumPy comes with no tools to append data to existing npy files, beyond reading the data into memory, building a new array, and writing the new array to a file. If you want to save more data, consider writing a new file, or pick a different file format.

Solution 2:

In Python3 repeated save and load to the same open file works:

In [113]: f = open('test.npy', 'wb')
In [114]: np.save(f, np.arange(10))
In [115]: np.save(f, np.zeros(10))
In [116]: np.save(f, np.ones(10))
In [117]: f.close()
In [118]: f = open('test.npy', 'rb')
In [119]: for _ in range(3):
     ...:     print(np.load(f))
     ...:     
[0123456789]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
In [120]: np.load(f)
OSError: Failed to interpret file <_io.BufferedReader name='test.npy'> as a pickle

Each save writes a self contained block of data to the file. That consists of a header block, and an image of the databuffer. The header block has information about the length of the databuffer.

Each load reads the defined header block, and the known number of data bytes.

As far as I know this is not documented, but has been demonstrated in previous SO questions. It is also evident from the save and load code.

Note these are separate arrays, both on saving and loading. But we could concatenate the loads into one file if the dimensions are compatible.

In [122]: f = open('test.npy', 'rb')
In [123]: np.stack([np.load(f) for _ in range(3)])
Out[123]: 
array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
In [124]: f.close()

Append multiple numpy files to one big numpy file in python

loading arrays saved using numpy.save in append mode

Solution 3:

The file function was deprecated in Python 3. Though I won't guarantee that it works, the Python 3 code equivalent to the code in the link in your question would be

withopen('myfile.npy', 'ab') as f_handle:
    np.save(f_handle, Matrix)

This should then append Matrix to 'myfile.npy'.

Post a Comment for "How To Use Numpy.save In Append Mode"