views:

223

answers:

5

Our service tends to fall asleep during the nights on our client's server, and then have a hard time waking up. What seems to happen is that the process heap, which is sometimes several hundreds of MB, is moved to the swap file. This happens at night, when our service is not used, and others are scheduled to run (DB backups, AV scans etc). When this happens, after a few hours of inactivity the first call to the service takes up to a few minutes (consequent calls take seconds).

I'm quite certain it's an issue of virtual memory management, and I really hate the idea of forcing the OS to keep our service in the physical memory. I know doing that will hurt other processes on the server, and decrease the overall server throughput. Having that said, our clients just want our app to be responsive. They don't care if nightly jobs take longer.

I vaguely remember there's a way to force Windows to keep pages on the physical memory, but I really hate that idea. I'm leaning more towards some internal or external watchdog that will initiate higher-level functionalities (there is already some internal scheduler that does very little, and makes no difference). If there were a 3rd party tool that provided that kind of service is would have been just as good.

I'd love to hear any comments, recommendations and common solutions to this kind of problem. The service is written in VC2005 and runs on Windows servers.

+6  A: 

As you mentioned, forcing the app to stay in memory isn't the best way to share resources on the machine. A quick solution that you might find that works well is to simply schedule an event that wakes your service up at a specific time each morning before your clients start to use it. You can just schedule it in the windows task scheduler with a simple script or EXE call.

Paul Alexander
A: 

In terms of cost, the cheapest and easiest solution is probably just to buy more RAM for that server, and then you can disable the page file entirely. If you're running 32-bit Windows, just buy 4GB of RAM. Then the entire address space will be backed with physical memory, and the page file won't be doing anything anyway.

kquinn
That would be true if only my service was running on the server. However, it also runs a web server, a database, an antivirus and other services. To eliminate the need of the page file will require 4GB for each process, wouldn't it? Having that said, adding more RAM is a good solution, but it's up to the client, and I'm looking for a one time development job that will not force many clients to spend more money.
eran
"To eliminate the need of the page file will require 4GB for each process, wouldn't it?" Not as I understand it. For 32-bit Windows, 4GB is the grand total for everything, and you're lucky if you get all four gigabytes. This machine, for instance, has 4GB of physical memory but only 2.75GB available to programs, since other stuff eats up address space.
kquinn
But each process gets its /own/ 4GB of address space (not counting whatever the kernel takes), so if you have 30 processes each of them could use 1GB of RAM and just have most of it swapped out, in which case you still need a page file...
Steven Schlansker
-1 for recommending disabling the page file. Bad idea, as it will only result in processes getting terminated when the system otherwise would have written data to the page file. Simply adding more RAM should help keep more working set in RAM.
Andrew Medico
+1  A: 

A third approach could be to have your service run a thread that does something trivial like incrementing a counter and then sleeps for a fairly long period, say 10 seconds. Thios should have minimal effect on other applications but keep at least some of your pages available.

anon
I think this will keep the pages that hold this piece of code in memory, but the heap will still be swapped. Since way more memory is used by the heap, the problem is probably on its part. I could walk the heap instead of incrementing the counter, but that would not have a minimal effect...
eran
A: 

I'm not saying you want to do this, or that it is best practice, but you may find it works well enough for you. It seems to match what you've asked for.

Summary: Touch every page in the process, on page at a time, on a regular basis.

What about a thread that runs in the background and wakes up once every N seconds. Each time the page wakes up, it attempts to read from address X. The attempt is protected with an exception handler in case you read a bad address. Then increment X by the size of a page.

There are 65536 pages in 4GB, 49152 pages in 3GB, 32768 pages in 2GB. Divide your idle time (overnight dead time) by how often you want (attempt) to hit each page.

BYTE *ptr;

ptr = NULL;
while(TRUE)
{
    __try
    {
        BYTE b;

        b = *ptr;
    }
    __except(EXCEPTION_EXECUTE_HANDLER)
    {
        // ignore, some pages won't be accessible
    }

    ptr += sizeofVMPage;

    Sleep(N * 1000);
}

You can get the sizeOfVMPage value from the dwPageSize value in the returned result from GetSystemInfo().

Don't try to avoid the exception handler by using if (!IsBadReadPtr(ptr)) because other threads in the app may be modifying memory protections at the same time. If you get unstuck because of this it will almost impossible to identify why (it will most likely be a non-repeatable race condition), so don't waste time with it.

Of course, you'd want to turn this thread off during the day and only run it during your dead-time.

Stephen Kellett
A: 

The other thing to ensure is that your data is localized.

In other words: do you really need all 300 MiB of the memory before you can do anything? Can the data structures you use be rearranged so that any particular request could be satisfied with only a few megabytes?

For example

  • if your 300 MiB of heap memory contains facial recognition data. Can the data internally be arranged so that male and female face data are stored together? Or big-noes are separate from small-noses?

  • if it has some sort of logical structure to it can be it sorted? so that a binary search can be used to skip over a lot of pages?

  • if it's a propritary, in-memory, database engine, can the data be better indexed/clustered to not require so many memory page hits?

  • if they're image textures, can commonly used textures be located near each other?

Do you really need all 300 MiB of the memory before you can do anything? You cannot service request without all that data back in memory?


Otherwise: scheduled task at 6 ᴀᴍ to wake it up.

Ian Boyd