Apache Arrow, Alignment And Padding
Solution 1:
The memory in Arrow is 64 byte aligned but in your example code, the conversion to Pandas/NumPy makes a copy of the data as a nested array of lists is differently represented in Arrow and in NumPy. In Arrow this is done using one buffer that holds the data of all lists while there is another buffer that holds the offsets for each list in that Array. As NumPy has no native list type, it is represented as a NumPy array that contains other NumPy arrays as elements. These are represented in the first NumPy array as Python objects.
Thus using the NumPy functions you see the memory as allocated by NumPy, not by Arrow. Thus if your memory address is on a 64 byte boundary, it is only by chance.
In the next version (0.9) of pyarrow
there will be a buffers
property to access the underlying memory addresses. You should then be able to directly check if the Arrow memory is allocated on a 64 byte aligned address (it always should be).
Post a Comment for "Apache Arrow, Alignment And Padding"