ansaurus

Question

How Can I Get Around this EOutOfMemory Exception When Encoding a Very Large File?

Answer 1

+5 A:

Read a chunk from the file, encode and write to another file, repeat.

Romain Hippeau 2010-06-29 02:23:16

@Romain: I originally had code to do that. But it was tricky at the boundary where you break it up because you might split a multi-byte input character. Also, the Encoding routine is so darned fast, it's a shame not to do it all at once.

lkessler 2010-06-29 02:32:25

@Ikessler - sometimes you have to go with either a time or space compromise. Performance should not be that bad if you read in 4k at a time or more.

Romain Hippeau 2010-06-29 02:44:50

...or even 40 MB at a time, since you seem to be able to handle that.

Mason Wheeler 2010-06-29 03:06:04

The thing to do is to make sure it works with chunks of 100 bytes at a time, which makes debugging easy and you test the boundary conditions, and then set it to something really large (perhaps dynamically) for the production code.

mj2008 2010-06-29 07:55:34

I wouldn't read "chunks", I would use a Stream. A fast unicode streamwith a readline, should be much faster than 300 mb of vm.

Warren P 2010-06-30 20:00:51

@Warren P A Stream with a readline is a chunk.

Romain Hippeau 2010-06-30 21:04:18

Answer 2

+6 A:

FillChar isn't allocating any memory, so that's not your problem. Try tracing into it and placing breakpoints at the RET statements, and you'll see that the FillChar finishes. Whatever the problem is, it's probably in a later step.

Mason Wheeler 2010-06-29 02:27:11

@Mason: Thanks for this. Yes you are correct. The RET statement in the middle of the FillChar routine is where it leaves from, so my break I had at the end of the routine didn't catch it. It does then get to MemoryManager.GetMem and signals the OutOfMemory error. I'll have to split the Encoding into chunks like @Romain says. You helped me out, but Romain answered my question, so I'll have to give him the accepted answer.

lkessler 2010-06-29 03:00:04

+1 for helping him out

Romain Hippeau 2010-06-29 03:10:40

Answer 3

+1 A:

A wild guess: Could the problem be memory being overcommitted and when the FillChar actually accesses the memory it can't find a page to actually give you? I don't know if Windows will even overcommit memory, I do know that some OSes do--you don't find out about it until you actually try to make use of the memory.

If this is the case it could cause the blowup in FillChar.

Loren Pechtel 2010-06-29 02:55:23

@Loren: Thanks for the response, but FillChar wasn't the problem after all, as @Mason was correct in pointing out.

lkessler 2010-06-29 03:03:33

Answer 4

+1 A:

Programs are great at looping. They loop tirelessly without complaining.

Allocating a huge amount of memory takes time. There will be many calls to the heap manager. Your OS won't even know if it has the amount of contiguous memory that you need ahead of time. Your OS says, yeah, I have 1 GB free. But as soon as you go to use it, your OS says, wait, you want all of it in one chunk? Let me make sure I have enough all in one place. If it doesn't you get the error.

If it does have the memory, well, there's still a lot of work for the heap manager in preparing the memory and marking it as used.

So, obviously, it makes some sense to allocate less memory and simply loop through it. This saves the computer from doing a lot of work that it will only have to undo when it's done. Why not have it do just a little bit of work in setting aside your memory, then just keep re-using it?

Stack memory is allocated much faster than heap memory. If you keep your memory usage small (under 1 MB, by default), the compiler may just use stack memory over heap memory, which will make your loops even faster. In addition, local variables that get allocated in the register are very fast.

There are factors such as hard drive cluster and cache sizes, CPU cache sizes, and things, that offer hints about the best chunk sizes. The key is to find a good number. I like to use 64 KB chunks.

Marcus Adams 2010-06-29 04:31:00

@Marcus: That's a good comment. I'll try using both 40 MB and 1 MB as blocking sizes and test to see whether more stack allocations is faster than fewer heap allocations.

lkessler 2010-06-29 05:28:21

The idea is to keep the memory allocated while you use it, but allocated on the stack. If you call a function repeatedly, which allocates the memory on the stack then frees it, you're still doing extra work. Loop with a for or while loop inside a function to reuse the memory.

Marcus Adams 2010-06-29 14:19:47

ansaurus

tags:

views:

answers:

How Can I Get Around this EOutOfMemory Exception When Encoding a Very Large File?

related questions