views:

2036

answers:

5

Hi, this is a "hard" question. I've found nothing interesting over the web.

I'm developing a Memory Management module for my company. We develop games for next-gen consoles (Xbox 360, PS3 and PC... we consider PC a console!).

We'll need in future, for our next games, to handle texture streaming for large game worlds that cannot be loaded all in main console memory (not talking about PC for now).

We are going to stream at the beginning hi-res mipmaps of textures (that is about 70% of the size of world data). Maybe in the future we'll have to stream also geometry, smaller mipmaps, audio, etc.

I'm developing a Memory Manager for that issue, focused on X360 (because over PS3 we can use host memory and the associated, auto-defragmenting GMM allocator).

The problem I'm facing is the following: We have decided to reserve a specific Memory Area for texture streaming (for example 64 Megabytes) and we want to handle all allocations and deallocations in that area. We have allocated the area at the beginning of the application and the area is Physically guaranteed to be contiguous (not just virtually, cause we need to store textures there).

I've implemented an auto defragmenting allocator, using handles instead of pointers. Time is not an issue, the problem is memory fragmentation. In game we continuously load and unload streaming targets, so we'd like to use the maximum amount of our buffer (64 Megabytes).

With this allocator we can use all of the allocated space but the defragmentation routine works in an unaccettable time (sometimes 60 milliseconds, more than a frames!) while the algorithm is not too bad... there are just too meny unevitable memcpy!

I'm looking for a solution to solve this issue. I'd like to find at least a good paper, or a post-mortem, or someone who have faced the same problem of mine.

Now I'm choosing between two strategies: 1) move the defragmentation routine on a dedicated thread (good for X360 with 6 hw threads, bad for PS3 with just a hw thread... and don't tell me to use SPU's!) with all multithreading problems of locking regions, of accessing a region who is being moved,... 2) find an "incremental" solution to defragmentation problem: we can give each frame a time budget (for example up to 1 millisecond) for defragmentation and the Memory Manager will do what it can do in the budget each frame.

Can someone tell me his experience about?

+10  A: 

I did a lot of study recently regarding memory management and this is the most informative and helpful article I found on the net.

http://www.ibm.com/developerworks/linux/library/l-memory/

Based on that paper the best and fastest result you will get is to divide your 64 MB into equal sized chunks. The size of chunks will depend on your object size. And allocate or deallocate a full chunk at a time. It's

  1. Faster than incremental garbage collection.
  2. Simpler.
  3. And solves that "too much fragmantation" problem by some amount.

Read it, you will find excellent information on every possible solution there is and merits and demerits for each.

CDR
As a game dev I second this idea, we're lucky game textures have nice sizes, by default so it should work just fine, but you may also want to use a couple bucket sizes instead of a single bucket. Now another optimization to consider is to create an algorithm that places textures to be used in the same render pass next to eachother, but it won't make an incredibly huge difference, maybe just remember and optimize if you really need it, depend on the game and rendering.
Robert Gould
great link, great answer! Thanks a lot!
ugasoft
+1  A: 

Since you're using handles, you have a lot of freedom to move memory around. I think that using a separate thread is probably not the best (safest or fastest) way -- my guess is you'd be better off using a type of incremental copying allocator, where on each malloc() or free() you compact (copy forwards or backwards in memory) some number of allocated blocks, with the number of bytes that you copy depleting a "budget" that is periodically reset to its initial value (e.g. on each screen refresh). (Of course only whole blocks are copied.)

The idea is that copying a given number of bytes takes a fairly predictable amount of time, so you can estimate how many bytes of copying you can safely perform per screen refresh, and limit yourself to that. If there's enough time in the budget, a call to malloc() or free() will completely defragment memory, otherwise it will defragment it as much as possible in the given the time constraints.

There are some questions I'm leaving unresolved here -- e.g. exactly how to compact the memory. A standard non-incremental copying allocator can just start allocating from the front, then copy everything to the back (freeing up memory at the front) when memory runs out, but you don't have that freedom here. You might need some heuristics to decide whether to move blocks forward or backwards. The important thing is to avoid oscillations (the same block being moved forwards then backwards on successive calls to malloc() or free()).

j_random_hacker
+1  A: 

I would recommend an incremental approach. Each frame find a contiguous block of inuse memory that has free space on both sides and move it in whichever direction will allow it to fit. or you could just move all blocks in one direction, find a gap and a inuse block that is the best fit for it and move it. On the 360 you should probably use a thread to do the move, while on the PS3 it would be best to use the GPU to move the data around for you.

jwaker
+4  A: 

Why not use multiple memory areas for the streamed textures and pool by texture size?

Insomniac has a paper about their texture streaming implementation on PS3. I suppose it could be helpful: link.

For general allocation strategies to minimize fragmentation, maybe Doug Lea can help.

But from my reading of your question, it sounds like you're overthinking it and I highly recommend a pooled approach. (Also running a defragment pass on write-combined memory doesn't sound particularly safe or fun.)

Dan Olson
+1  A: 

We have almost exactly the system you described, except that we allocate in fixed-size slots - 256x256, 512x512, 1024x1024 and 2048x2048 texture, in two formats each (DXT1 and DXT5) - precisely to avoid memory management.

don't you have rectangular textures? we have also 512x1024, 256x512 and 256x1024 (but we avoided 2048 textures). This will lead to a lot of "fixed sizes"...
ugasoft