views:

134

answers:

2

I am parsing using a pretty large grammar (1.1 GB, it's data-oriented parsing). The parser I use (bitpar) is said to be optimized for highly ambiguous grammars. I'm getting this error:

1terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc
dotest.sh: line 11: 16686 Aborted                 bitpar -p -b 1 -s top -u unknownwordsm -w pos.dfsa /tmp/gsyntax.pcfg /tmp/gsyntax.lex arbobanko.test arbobanko.results

Is there hope? Does it mean that it has ran out of memory? It uses about 15 GB before it crashes. The machine I'm using has 32 GB of RAM, plus swap as well. It crashes before outputting a single parse tree; I think it crashes after reading the grammar, during an attempt to construct a chart parse for the first sentence.

The parser is an efficient CYK chart parser using bit vector representations; I presume it is already pretty memory efficient. If it really requires too much memory I could sample from the grammar rules, but this will decrease parse accuracy of course.

I think the problem is probably that I have a very large number of non-terminals, I should probably try to look for a different parser (any suggestions?)

A: 

If your application uses 32Bit memory model then each process will get 4GB of virtual address space. Out of which only 2G is available for user space.

I suspect your parser might be trying to allocate more than available virtual memory. I am not sure if the Parser provides mechanism for custom memory allocation. If so, you can try using memory mapped files for allocation and bring it to memroy only when it is needed.

aJ
As I write the parser has already allocated about 15 GB of virtual memory before it crashes, according to htop. As such I don't think the 4GB limit is present. Any custom allocation would have to be hacked into the code and I don't speak C++ ...
Andreas
The limit is for 32 bit OS. If it is 64 Bit the limit does not apply.
aJ
Yep, I just checked, it's running x64; but that makes sense for a computer with >4GB of memory.
Andreas
+2  A: 

It is possible that memory becomes fragmented. That means that your program can fail to allocate 1KB, even though 17 GB of memory is free, when those 17GB is fragmented into 34 million free chunks of 512 bytes each.

There's of course the possibility that your program miscalculates a memory allocation. A common bug is trying to allocate -1 bytes of memory. As memory sizes are always positive, that's interpreted as size_t(-1), much more than 32 GB. But there's really no fact which points in that direction.

To solve this problem, you will need someone who does speak C++. If it's indeed memory fragmentation, a good C++ programmer can tailor the memory allocation strategy to match your specific needs. Some strategies include keeping same-sized objects together, and replacing string by shims.

MSalters
This sounds possible but implausible to me, is there a quick way to measure memory fragmentation? I did a quick grep for alloc in the source, and there are about 40 matching lines. Perhaps counting the number of times alloc is called could help, where a high count would be suspicious.
Andreas
@Andreas: You tagged the question C++ rather than C, and there's lots of ways to allocate memory in C++ that don't involve any function with "alloc" in its name.
David Thornley