views:

1575

answers:

14

I have written a converter that takes openstreetmap xml files and converts them to a binary runtime rendering format that is typically about 10% of the original size. Input file sizes are typically 3gb and larger. The input files are not loaded into memory all at once, but streamed as points and polys are collected, then a bsp is run on them and the file is output. Recently on larger files it runs out of memory and dies (the one in question has 14million points and 1million polygons). Typically my program is using about 1gb to 1.2 gb of ram when this happens. I've tried increasing virtual memory from 2 to 8gb (on XP) but this change made no effect. Also, since this code is open-source I would like to have it work regardless of the available ram (albeit slower), it runs on Windows, Linux and Mac.

What techniques can I use to avoid having it run out of memory? Processing the data in smaller sub-sets and then merging the final results? Using my own virtual memory type of handler? Any other ideas?

+2  A: 

It sounds like you're already doing a SAX based approach to the XML processing (loading the XML as you go instead of all at once).

The solution is almost always to change the algorithm so that it cuts the problem into smaller parts. Physically don't allocate as much memory at one time, read in only what you need, process it, then write it out.

You can sometimes extend memory via using the hard drive instead when needed in your algorithm.

If you can't split up your algorithm, you probably want something like memory mapped files.

In the worst case you can try to use something like VirtualAlloc if you are on a windows system. If you are on a 32-bit system you can try to use something like Physical Address Extension (PAE).

You could also consider putting input limitations for your program, and having a different one for 32-bit and 64-bit systems.

Brian R. Bondy
+3  A: 

Have you checked to ensure you aren't leaking memory anywhere?

Since your program is portable to Linux, I suggest running it under Valgrind to make sure.

Douglas Leeder
Yes I have checked for leaks, there are none.
KPexEA
+13  A: 

First, on a 32-bit system, you will always be limited to 4 GB of memory, no matter pagefile settings. (And of those, only 2GB will be available to your process on Windows. On Linux, you'll typically have around 3GB available)

So the first obvious solution is to switch to a 64-bit OS, and compile your application for 64-bit. That gives you a huge virtual memory space to use, and the OS will swap data in and out of the pagefile as necessary to keep things working.

Second, allocating smaller chunks of memory at a time may help. It's often easier to find 4 256MB chunks of free memory than one 1GB chunk.

Third, split up the problem. Don't process the entire dataset at once, but try to load and process only a small section at a time.

jalf
Windows can have 3GB of virtual space with /LARGEADDRESSAWARE
Shay Erlichmen
The flag allows the process to use up to 4GB *if the OS can provide it*. Normally, Windows is still set up to only give 2GB to each process. That can be changed too, at the risk of driver instability, to give you 3GB, yes. With PAE, you can get even more. But 64-bit is probably a better bet.
jalf
Imho, the third option is the most important one. Apart from giving control over memory, it also allows for parallel processing.
xtofl
RE - Jalf "With PAE": As far as I know, Windows XP doesn't really support the 4 extra address bits that most hardware comes with ( see PAE ). I get this from wikipedia's PAE page which has a link to an Microsoft webpage . Can anyone confirm from experience that winXP ignores the extra address bits?
Trevor Boyd Smith
@Trevor: I haven't tried, but I'm pretty sure XP supports it (although of course you have to manually enable it). But does it matter? These days, switching to 64-bit is likely a far better solution.
jalf
A: 

It sound like you are doing txt to binary conversation so why do you need to have the entire data in the memory?.
Can't you just read a primitive from txt (xml) then save to binarystream?

Shay Erlichmen
+2  A: 

Assuming you are using Windows XP, if you are only just over your memory limit and do not desire or have the time to rework the code as suggested above, you can add the /3GB switch to your boot.ini file and then it just a matter of setting a linker switch to get an extra 1GB of memory.

Stephen Nutt
It's not that simple to use 3GB. You should make sure all your pointer aritmetic operations are safe, or else you'll crash when memory usage gets high. See http://blogs.msdn.com/oldnewthing/archive/2004/08/12/213468.aspx for more.
eran
+3  A: 

I suspect your memory issues are from keeping the BSP tree in memory. So keep the BSP on disk and only keep some chunks in memory. This should be fairly easy with BSP, as the structure lends itself more than some other tree structures, and the logic should be simple. To be both efficient and memory friendly you could have a cache w/ dirty flag, with the cache size set to available memory less a bit for breathing room.

dwc
+1  A: 

You have to understand that Virtual Memory is different from "RAM" in that the amount of Virtual Memory you're using is the total amount you've reserved, while real memory (in Windows its called Working Set) is memory that you've actually modified or locked.

As someone else pointed out, on 32-bit Windows Platforms the limit on Virtual Memory is 2 gigabytes unless you set the special flag for 3 gigabytes and can ensure that all the pointers both in your code and any libraries you use only use unsigned pointers.

So either forcing users to 64-bit or monitoring your Virtual Memory and capping your max block size to something that comfortably fits inside the limits imposed by 32-bit operating systems would be my advice.

I've slammed into the 32-bit wall in Windows, but have no experience with working around these limitations in Linux so I've only talked about the Windows side of things.

Doug Boone
+1  A: 

On 32-bit XP your maximum program address space is 2GB. Then you have fragmentation due to DLL's and drivers loading up in to your address space. Finally, you have the problem of your heap fragmenting.

Your best move is just to get it over with and run as a 64-bit process (on a 64-bit system). Suddenly all these problems go away. You can use a better heap to mitigate heap fragmentation effects, and you can try using VirtualAlloc to grab your memory in one big contiguous chunk (and then you get to manage it from there!) to discourage DLL's/drivers from fragmenting it.

Finally, you can split your BSP across processes. Complicated and painful, and frankly just putting it on disk would be easier, but in theory you could get better performance by having a group of processes exchanging information, if you can keep everything resident (and assuming you can be smarter than memory than the OS can handle file buffering... which is a big if). Each process would need far less memory and therefore shouldn't run in to the 2GB address space limit. Of course, you'll burn through RAM/swap a lot faster.

You can mitigate the effects of fragmentation of the address space by allocating smaller chunks. This will have other nasty side effects, but you could follow a backoff policy where you grab smaller and smaller chunks of memory if you fail to successfully allocate. Frequently this simple approach will get you a program that works when it otherwise wouldn't, but the rest of the time performs as well as it could.

Boy, doesn't 64-bit computing just sound so much nicer than the other choices?

Christopher Smith
+1  A: 

How are you allocating memory for points ? Are you allocating point one at a time (e.g. pt = new Point ). Then depending on the size of point, some memory may get wasted. For example on windows memory is allocated in the multiples of 16 bytes, so even if you ask try to allocate 1 byte, OS will actually allocate 16 bytes.

If this is the case, using a memory allocator may help. You can do a quick check using STL allocator. (over load the new operator for the Point class and use the STL allocator to allocate memory rather than 'malloc' or default new operator).

Nitin Bhide
I am allocating points and polys using a heap manager so they only take as much space has necessary and have almost no overheard since my heap allocated (in this case) 1mb chunks and doles out requests from each chunk
KPexEA
Another reason could be 'memory fragmentation' (i.e. memory is available in small chunks but when you ask for '1 Mb', contiguous 1 MB chunk is not available.Does the XML parser uses 'heap manager'? May be XML parser is using standard memory allocation and causing fragmentation ?
Nitin Bhide
A: 

If you want to be memory-size independent, you need a size-independent algorithm. No matter what size your RAM is, if you don't have memory usage under control, you're going to bump into the border.

Take a look at the least chunk of information you can possibly use to produce a bit of output. Then think of a way to divide the input into chunks of this size.

Now that sounds easy, doesn't it? (Glad I don't have to do it :) )

xtofl
+1  A: 

You may not be allocating and deallocating memory in an optimum manner. As others have pointed out, you may be leaking memory and not knowing it. Debugging and optimizing memory allocation will take time.

If you don't want to spend time optimizing memory usage, why not try the Conservative Garbage Collector? It's a plug-in replacement for malloc()/new and free(). In fact, free() is a no-op, so you can just remove those calls from your program. If, instead, you hand-optimize your program and manage a pool of memory as previously suggested, you'll end up doing a lot of the work that the CGC already does for you.

Barry Brown
+1  A: 

You need to stream your output as well as your input. If your output format is not stream-oriented, consider doing second pass. For example, if the output file starts with check sum/size of the data, leave space on the first pass and seek/write to that space later.

Arkadiy
A: 

You may have a "memory leak" even if your code does not leak memory. The CRT, which sometimes has its own memory allocator on top of the OS memory manager, may be keeping memory you freed, assuming you will ask for it again soon. If this is the case and you don't use that memory as the CRT expects, you'll see your memory usage constantly growing.

When this is the case, when possible you might want to use your own custom allocator, which could ask the CRT for a large chunk of memory, and then devide it into smaller pieces. If you can reuse those smaller pieces, it might take the burden off the CRT allocator. Another thing that might help the CRT allocator is making more use of the stack. I don't know if this is feasable in your case, but if so it will reduce heap fragmantation.

As a last resort, you may also try calling _heapmin every once in a while and see if it helps. I don't like relying on such calls and it's probably not portable, but there is a remote chance it would save some memory.

eran
A: 

You don't need to switch to 64-bit machines, nor you need most of the 1000 things suggested by others. What you need is a more thoughtful algorithm.

Here are some things you can do to help out with this situation:

  • If you're on Windows, utilize File Maps (sample code). This will give access to the file via a single buffer pointer as though you read the whole file in memory, only without actually doing that. Recent versions of Linux Kernel have a similar mechanism.
  • If you can, and it looks like you could, scan the file sequentially and avoid creating an in-memory DOM. This will greatly decrease your load-time as well as memory requirements.
  • Use Pooled Memory! You will probably have many tiny objects, such as nodes, points and whatnot. Use a pooled memory to help out (I'm assuming you're using an unmanaged language. Search for Pooled allocation and memory pools).
  • If you're using a managed language, at least move this particular part into an unmanaged language and take control of the memory and file reading. Managed languages have a non-trivial overhead both in memory footprint and performance. (Yes, I know this is tagged "C++"...)
  • Attempt to design an in-place algorithm, where you read and process only the minimum amount of data at a time, so your memory requirements would go down.

Finally, let me point out that complex tasks require complex measures. If you think you can afford a 64-bit machine with 8GB of RAM, then just use "read file into memory, process data, write output" algorithm, even if it takes a day to finish.

Ash