views:

384

answers:

7

Hi,

I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed.

When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches

--param ggc-min-expand=0 --param ggc-min-heapsize=4096

may solve the problem. They, however, only seem to work when optimization is turned on.

1) Is this really the solution that I am looking for?

2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this?

Best wishes,

Alexander

Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results)

The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++.

However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

A: 

To better assist you, please post the reason why you have the auto generated C++ source file and what target you are compiling against (ie: executable, library, etc).

Zyris Development Team
This isn't an answer. For future reference, if you want to ask for clarification on a question, leave a comment on that question. Unfortunately, you can't leave comments until you have at least 50 reputation. But when you do get more rep, remember that StackOverflow standard practice is to discuss the question in comments, but provide actual answers in the answers section.
A. Levy
Thank you, however if i don't have the ability to add comments how can i ask this question. No body else has? I don't see why i was give -1 reputation for that?
Zyris Development Team
If generating 40 MB of C++ is the optimal solution, we don't want to know.
Jurily
Am i to take it that is site policy? That we don't want to know the implementation to possibly suggest a better answer? If so, as much as i disagree with it i sincerely apologize for my post.
Zyris Development Team
+1  A: 

You may be better off using a constant data table instead. For example, instead of doing this:

void f() {
    a.push_back("one");
    a.push_back("two");
    a.push_back("three");
    // ...
}

try doing this:

const char *data[] = {
    "one",
    "two",
    "three",
    // ...
};

void f() {
    for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++) {
        a.push_back(data[i]);
    }
}

The compiler will likely be much more efficient generating a large constant data table, rather than huge functions containing many push_back() calls.

Greg Hewgill
+1  A: 

Can you do the same problem without generating 40 MB worth of C++? That's more than some operating systems I've used. A loop and some data files, perhaps?

Jurily
A loop and data files...could you expand on your answer? Maybe with a simple example. Its possible that he won't be able to get anywhere with a one sentence hint...
A. Levy
+1  A: 

It sounds like your autogenerated app looks like this:

push_back(data00001);
...
push_back(data99999);

Why don't you put the data into an external file and let the program read this data in a loop?

codymanix
A: 

If you're just generating a punch of calls to push_back() in a row, you can refactor it into something like this:

// Old code:
v.push_back("foo");
v.push_back("bar");
v.push_back("baz");

// Change that to this:
{
    static const char *stuff[] = {"foo", "bar", "baz"};
    v.insert(v.end(), stuff, stuff + ARRAYCOUNT(stuff));
}

Where ARRAYCOUNT is a macro defined as follows:

#define ARRAYCOUNT(a) (sizeof(a) / sizeof(a[0]))

The extra level of braces is just to avoid name conflicts if you have many such blocks; alternatively, you can just generate a new unique name for the stuff placeholder.

If that still doesn't work, I suggest breaking your source file up into many smaller source files. That's easy if you have many separate functions; if you have one enormous function, you'll have to work a little harder, but it's still very doable.

Adam Rosenfield
A: 

To complement some of the answers here, you may be better off generating a binary object file and linking it directly -- as opposed to compiling files consisting of const char[]'s.

I had a similar problem working with gcc lately. (Around 60 MB of PNG data split into some 100 header files.) Including them all is the worst option: The amount of memory needed seems to grow exponentially with the size of the compilation unit.

aib
You should have kept the PNG data in source files, not headers. Header files should just have `extern const char img_data[]; extern const size_t img_data_size;` and the source files should have `char img_data[] = {...}; const size_t img_data_size = sizeof(img_data);` It's much easier for the compiler to handle, and files using the image data don't need to be recompiled when the images change.
Adam Rosenfield
@Adam Rosenfeld: That would have worked, yes, but would have been a hack in that it would not have solved the actual problem, which is the binary stream going through the compiler in the first place. (Binary data -> C source -> compiler -> binary data -- doesn't really sound right, does it?) By the way, the 'linker' solution ended up looking exactly like yours: With headers just containing extern char* + extern size.
aib
...and I think I did that when compiling on MacOS X, whose linker was different and the compiler suite had no obvious way of converting binary data into an object file. But as long as you have an object file containing the two symbols for data start + data size (or data start + data end, it might have been) it doesn't matter who created it and how, does it?
aib
A: 

if you cannot refactor your code, you could try to increment amount of swap space you have, provided your operating system supports large address space. This should work for 64-bit computers, but 3 gigabytes might be too much for 32 bit system.

aaa