views:

263

answers:

6

PAE (Physical Address Extension) was introduced in CPUs back in 1994. This allows a 32-bit processor to access 64 GB of memory instead of 4 GB. Linux kernels offer support for this starting with 2.3.23. Assume I am booting one of these kernels, and want to write an application in C that will access more than 3 GB of memory (why 3 GB? See this).

How would I go about accessing more than 3 GB of memory? Certainly, I could fork off multiple processes; each one would get access to 3 GB, and could communicate with each other. But that's not a realistic solution for most use cases. What other options are available?

Obviously, the best solution in most cases would be to simply boot in 64-bit mode, but my question is strictly about how to make use of physical memory above 4 GB in an application running on a PAE-enabled 32-bit kernel.

+2  A: 

You can't have pointers pointing to > 4G of address space, so you'd have to do a lot of tricks.

It should be possible to switch a block of address space between different physical pages by using mmap to map bits of a large file; you can change the mapping at any time by another call to mmap to change the offset into the file (in multiples of the OS page size).

However this is a really nasty technique and should be avoided. What are you planning on using the memory for? Surely there is an easier way?

MarkR
Yes, the easier way is to simply boot a 64-bit kernel. I expect any solution would involve nasty hackery, I'm just interested in how nasty it is.
ChrisInEdmonton
It really depends on your use case. If you are using the ram as a glorified disc cache, then you can mmap blocks as necessary, but it creates a lot of overhead where mmap will need to mess around with page tables etc, even if the required pages are already mapped into ram. This approach also falls to pieces if you're using threads, as they have a shared address space so mmap is basically unsafe without excessive amounts of locking which would probably make it extremely inefficient.
MarkR
+4  A: 

You don't, directly -- as long as you're running on 32-bit, each process will be subject to the VM split that the kernel was built with (2GB, 3GB, or if you have a patched kernel with the 4GB/4GB split, 4GB).

One of the simplest ways to have a process work with more data and still keep it in RAM is to create a shmfs and then put your data in files on that fs, accessing them with the ordinary seek/read/write primitives, or mapping them into memory one at a time with mmap (which is basically equivalent to doing your own paging). But whatever you do it's going to take more work than using the first 3GB.

hobbs
+2  A: 

On Unix one way to access that more-than 32bit addressable memory in user space by using mmap/munmap if/when you want to access a subset of the memory that you aren't currently using. Kind of like manually paging. Another way (easier) is to implicitly utilize the memory by using different subsets of the memory in multiple processes (if you have a multi-process archeteticture for your code).

The mmap method is essentially the same trick as commodore 128 programmers used to do for bank switching. In these post commodore-64 days, with 64-bit support so readily available, there aren't many good reasons to even think about it;)

I had fun deleted all the hideous PAE code from our product a number of years ago.

Peeter Joot
Thank you for noting how things were back in the Commodore 64 and 128 days. That alone was worth +1 to me. :)
ChrisInEdmonton
+1  A: 

PAE is an extension of the hardware's address bus, and some page table modifications to handle that. It doesn't change the fact that a pointer is still 32 bits, limiting you to 4G of address space in a single process. Honestly, in the modern world the proper way to write an application that needs more than 2G (windows) or 3G (linux) of address space is to simply target a 64 bit platform.

Andy Ross
True, but there are ways to access the additional memory, both on Windows and Linux. Sure, the hoops are probably not worth the effort.
ChrisInEdmonton
+2  A: 

Or you could fire up as many instances of memcached as needed until all physical memory is mapped. Each memcached instance could make 3GiB available on a 32 bit machine.

Then access memory in chunks via the APIs and language bindings for memcached. Depending on the application, it might be almost as fast as working on a 64-bit platform directly. For some applications you get the added benefit of creating a scalable program. Not many motherboards handle more than 64GiB RAM but with memcached you have easy access to as much RAM as you can pay for.

Amigable Clark Kant
+1 because this is a novel approach to the problem! Very nice!
PP
A: 

thx @Clark - that was useful. Has anyone reported success with this technique?
I suppose it will work well for a typical webapp - where the different process are isolated and do not communicate with each other.
Even then - there will be some overheads with the mmap call. How would you measure it?

> It really depends on your use case. If you are using the ram as a glorified disc cache,
> then you can mmap blocks as necessary, but it creates a lot of overhead where mmap will
> need to mess around with page tables etc, even if the required pages are already mapped
> into ram. This approach also falls to pieces if you're using threads, as they have a shared
> address space so mmap is basically unsafe without excessive amounts of locking which would
> probably make it extremely inefficient
> @MarkR (commented above)

deepak