In Managed Code, How Do I Achieve Good Locality Of Reference?
Solution 1:
- In .NET, elements of an array are certainly contiguous. In Java I'd expect them to be in most implementations, but it appears not to be guaranteed.
- I think it's reasonable to assume that the memory used by an instance for fields is in a single block... but don't forget that some of those fields may be references to other objects.
For the Java array part, Sun's JNI documentation includes this comment, tucked away in a discussion about strings:
For example, the Java virtual machine may not store arrays contiguously.
For your last question, if you have two int[]
then each of those arrays will be a contiguous block of memory, but they could be very "far apart" in memory. If you have an array of objects with two int fields, then each object could be a long way from each other, but the two integers within each object will be close together. Potentially more importantly, you'll end up taking a lot more memory with the "lots of objects" solution due to the per-object overhead. In .NET you could use a custom struct with two integers instead, and have an array of those - that would keep all the data in one big block.
I believe that in both Java and .NET, if you allocate a lot of smallish objects in quick succession within a single thread then those objects are likely to have good locality of reference. When the GC compacts a heap, this may improve - or it may potentially become worse, if a heap with
AB C D E
is compacted to
A D E B
(where C is collected) - suddenly A and B, which may have been "close" before, are far apart. I don't know whether this actually happens in any garbage collector (there are loads around!) but it's possible.
Basically in a managed environment you don't usually have as much control over locality of reference as you do in an unmanaged environment - you have to trust that the managed environment is sufficiently good at managing it, and that you'll have saved enough time by coding to a higher level platform to let you spend time optimising elsewhere.
Solution 2:
First, your title is implying C#. "Managed code" is a term coined by Microsoft, if I'm not mistaken.
Java primitive arrays are guaranteed to be a continuous block of memory. If you have a
int[] array = newint[4];
you can from JNI (native C) get a int *p
to point to the actual array. I think this goes for the Array* class of containers as well (ArrayList, ArrayBlockingQueue, etc).
Early implementations of the JVM had objects as contiuous struct, I think, but this cannot be assumed with newer JVMs. (JNI abstracts away this).
Two integers in the same object will as you say probably be "closer", but they may not be. This will probably vary even using the same JVM.
An object with two int fields is an object and I don't think any JVM makes any guarantee that the members will be "close". An int-array with two elements will very likely be backed by a 8 byte long array.
Solution 3:
With regards to arrays here is an excerpt from CLI (Common Language Infrastructure) specification:
Array elements shall be laid out within the array object in row-major order (i.e., the elements associated with the rightmost array dimension shall be laid out contiguously from lowest to highest index). The actual storage allocated for each array element can include platform-specific padding. (The size of this storage, in bytes, is returned by the sizeof instruction when it is applied to the type of that array’s elements.
Solution 4:
Good question! I think I would resort to writing extensions in C++ that handle memory in a more carefully managed way and just exposing enough of an interface to allow the rest of the application to manipulate the objects. If I was that concerned about performance I would probably resort to a C++ extension anyway.
Solution 5:
I don't think anyone has talked about Python so I'll have a go
Can I expect an array to be an adjacent block of memory (yes)?
In python arrays are more like arrays of pointers in C. So the pointers will be adjacent, but the actual objects are unlikely to be.
Are two integers in the same instance closer than two in different instances of the same class (probably)?
Probably not for the same reason as above. The instance will only hold pointers to the objects which are the actual integers. Python doesn't have native int (like Java), only boxed Int (in Java-speak).
Does an object occupy a contigous region in memory (no)?
Probably not. However if you use the __slots__
optimisation then some parts of it will be contiguous!
What's the difference between an array of objects with only two int fields and a single object with two int[] fields? (this example is probably Java specific)
In python, in terms of memory locality, they are both pretty much the same! One will make an array of pointers to objects which will in turn contain two pointers to ints, the other will make two arrays of pointers to integers.
Post a Comment for "In Managed Code, How Do I Achieve Good Locality Of Reference?"