views:

433

answers:

6

My goal is to ensure that an array allocated in java is allocated across contiguous physical memory. The issue that I've run into is that the pages an array is allocated across tend not to be contiguous in physical memory, unless I allocate a really large array.

My questions are:

  • Why does a really large array ensure pages which are contiguous in physical memory?
  • Is there any way to ensure an array is allocated across physical memory, that doesn't involve making the array really large?
  • How can I tell what page or physical address a Java object/array exists in, without measuring cache hits/cache misses?

I'm not looking for answers asking why I am doing this in java. I understand that C would "solve my problem", and that I'm going against the fundamental nature of java. Never-the-less I have a good reason for doing this.

The answers need not be guaranteed to work all the time. I am looking for answers that work most of the time. Extra points for creative, out-of-the-box answers that no reasonable Java programmer would ever write. It's OK to be platform specific(x86 32-bit 64-bit).

+1  A: 

I would think that you would want to use sun.java.unsafe.

McWafflestix
There's one slight problem here: Sun tells you to never, EVER use sun.* because they can (and do) change classes between versions.
R. Bemrose
Look, the OP stated that he was going against the fundamental nature of Java. If you're going to do that, might as well use sun.*.
McWafflestix
Thanks, McWafflestix, I think you were the only person who read the question and provided a thoughtful answer, rather than just piling on.
e5
I try... :-) Glad to help.
McWafflestix
The Unsafe class is used by the ByteBuffer class and in this case Unsafe does give you anything you can't do using ByteBuffer.
Peter Lawrey
A: 

Note this answer to a related question, which discusses System.identityHashCode() and identification of the memory address of the object. The bottom line is that you can use the default array hashCode() implementation to identify the original memory address of the array (subject to fitting in an int/32-bit)

Brian Agnew
If you're going to downvote this, it might be nice to indicate *why* you disagree with it so strongly. I believe it offers some pointers towards issue #3
Brian Agnew
I have no idea why someone down voted you. It is interesting that hashcodes might leak implementation details.
e5
Thank you :-) I feel slightly less aggrieved :-)
Brian Agnew
I think this question and your helpful answer it offended someone. I got 5 votes down on the question and 5 votes up, hence the 0. I knew people would be annoyed that I was asking an "how can I make java really ugly and hardware specific", but I didn't expect quiet this level of voting down. I guess it's like going to baseball forum and asking what sort of match should burn rare cards with.
e5
+2  A: 

There may be ways to trick a specific JVM into doing what you want, but these would probably be fragile, complicated and most likely very specific to the JVM, its version, OS it runs on etc. In other words, wasted effort.

So without knowing more about your problem, I don't think anyone will be able to help. There certainly is no way to do it in Java in general, at most on a specific JVM.

To suggest an alternative:

If you really need to store data in contiguous memory, why not do it in a small C library and call that via JNI?

sleske
+4  A: 

No. Physically contiguous memory requires direct interaction with the OS. Most applications, JVM included only get virtually contiguous addresses. And a JVM cannot give to you what it doesn't get from the OS.

Besides, why would you want it? If you're setting up DMA transfers, you probably are using techniques besides Java anyway.

Bit of background:

Physical memory in a modern PC is typically a flexible amount, on replacable DIMM modules. Each byte of it has a physical address, so the Operating System during boot determines which physical addresses are available. It turns out applications are better off by not using these addresses directly. Instead, all modern CPUs (and their caches) use virtual addresses. There is a mapping table to physical addresses, but this need not be complete - swap to disk is enabled by the use of virtual addresses not mapped to physical addresses. Another level of flexibility is gained from having one table per process, with incomplete mappings. If process A has a virtual address that maps to physical address X, but process B doesn't, then there is no way that process B can write to physical address X, and we can consider that memory to be exclusive to process A. Obviously for this to be safe, the OS has to protect access to mapping table, but all modern OSes do.

The mapping table works at the page level. A page, or contiguous subset of physical addresses is mapped to a contiguous subset of virtual addresses. The tradeoff between overhead and granularity has resulted in 4KB pages being a common page size. But as each page has its own mapping, one cannot assume contiguity beyond that page size. In particular, when pages are evicted from physical memory, swapped to disk, and restored, it's quite possible that the end up at a new physical memory address. The program doesn't notice, as the virtual address does not change, only the OS-managed mapping table does.

MSalters
I mentioned in my question that I wasn't interested in answers debating why it would be useful to me. Physically contiguous memory DOES NOT require direct interaction with the OS. As I stated in the question, I can already do force java to use physically contiguous addresses, I was looking for a better way to do it.
e5
Sorry, you are wrong, and your question does not prove anything. Cache hits are not relevant to the virtual/physical, as they are designed to resolve addresses in virtual memory.The page table determines the mapping between physical and virtual memory. On the x86, it is managed at Ring 0. JVMs run in Ring 3 and tehrefore are simply unable to alter it.
MSalters
+3  A: 

Given that the garbage collector moves objects around in (logical) memory, I think you are going to be out of luck.

About the best you could do is use ByteBuffer.allocateDirect. That will (typically) not get moved around (logical) memory by the GC, but it may be moved in physical memory or even paged out to disc. If you want any better guarantees, you'll have to hit the OS.

Having said that, if you can set the page size to be as big as your heap, then all arrays will necessarily be physically contiguous (or swapped out).

Tom Hawtin - tackline
Yeah, large pages are really the only way to make it work, the trouble is that most OS's have fixed increments, with the maximum far smaller than the heap size (the reference I read on SPARC said 4M was the biggest size). Of course, doing this is very OS-specific; here are a few links that I turned up: http://linuxgazette.net/155/krishnakumar.html for Linux, and http://www.solarisinternals.com/wiki/index.php/Multiple_Page_Size_Support for Solaris. I suspect that the method call overhead from ByteBuffer will far outweigh any savings from having contiguous data.
kdgregory
Bytebuffer works beautifully, all the pages are contiguous in virtual memory and with a little tweaking one can find the actual virtual memory address that the buffer starts at. =)
e5
Solaris 10 supports 256 MB pages sizes. This is done to reduce TLB misses.
Peter Lawrey
Peter Lawrey: Of course it depends upon the underlying hardware. IIRC, AMD x64 can only support around 4 MB pages. Niagara can support 256 MB pages, which is important because Niagara 1 (but not 2) has a small TLA buffer for so many hardware threads.
Tom Hawtin - tackline
+1  A: 

As I see it. You have yet to explain why

  • that primitive arrays are not continuous in memory. I don't see why they wouldn't be continuous in virtual memory. (c.f. Arrays of Object are unlikely have its Objects continuous in memory)
  • an array which is not continuous in physical memory (RAM i.e. Random Access Memory) would have a significant performance difference. e.g. measurable difference in the performance of your application.

What its appears is you are really looking for a low level way to allocate arrays because you are used to doing this in C, and performance is a claim for a need to do this.

BTW: Accessing ByteBuffer.allocateDirect() with say getDouble()/putDouble() can be slower that just using a double[] as the former involves JNI calls and the latter can be optimised to no call at all.

The reason it is used is for exchanging data between the Java and C spaces. e.g. NIO calls. It only performs well when read/writes are kept to a minimum. Otherwise you are better off using something in the Java space.

i.e. Unless you are clear what you are doing and why you are doing it, you can end up with a solution which might make you feel better, but actually is more complicated and performs worse than the simple solution.

Peter Lawrey
My concern is not performance. I was experimenting with cache, trying to learn more about memory management and the capabilities of java. I had a way that worked (namely allocating an array), but I was worried that it might fail sometimes, as I've been told java will sometimes move things around in memory.
e5
The why is that I was experimenting and had some unanswered questions about java. Googled for a while and didn't find an answer. Stack overflow is a good place to ask questions so I asked.
e5
What sort of failure were you concerned about? The GCs have a phase for compacting memory, that is more than most developers need to know about memory management. The whole point of the GC is to reduce how much developers need to worry about memory management. I suggest you try to develop the simplest code you can and only make it more complicated when you have determined you actually need to.
Peter Lawrey
Which means don't use ByteBuffer, unless you have determined you actually need to and a plain array is not enough. (Performance is the *only* reason I can think of)
Peter Lawrey