views:

552

answers:

3

I have a C library, which I build as a shared object for Linux and a DLL for Windows with MinGW32. The API depends on a couple of data files (statistical models) which I'd really like to roll in with the SO/DLL so that deployment is just one file.

It looks like I can achieve this for Windows with a "resource file" compiled with windres, but then I've got to write a bunch of resource-handling code for Windows, and I'm still stuck with the files on Linux.

Is there a way to achieve the same functionality on Linux?

Even better, is there a portable solution?

A: 

Nothing so automatic as Windows resource files. I can offer two options:

  1. Declare it as file-scope variables in C. That's traditional. A Big Array Of Ints.

  2. Learn about the wonders of ld control files, see the GNU ld documentation.

bmargulies
+3  A: 

Two potential solutions:

  • Phong Vo's sfio library, which is part of the AT&T Advanced Software Technology toolset, is a wonderful replacement for C stdio.h, and it will allow you to open either files or memory blocks using a single API. So you can easily convert your existing files to C initialized data to include in your DLL or SO file.

    This is a good cross-platform solution, but the penalty is that the learning curve to get started is pretty high. They don't make it easy to figure out how stuff works or to take one part of their toolset and split it out for use independent of the other parts. But the good news is that if you want to adopt their U/Win system for running Unix codes on windows (all part of the same toolset), you can create DLLs and SOs using the same system.

  • For this kind of problem I often fall back on Lua; I can stored Lua data either in external files or within C as initialized data. This is great for distributing everything in one .so file; I do this for my students.

    Again the downside is that you have to master and incorporate a new technology.

In my own work I use Lua over the AT&T stuff for these reasons:

  • Lua has a much smaller footprint and is designed to play well with others; with AST you really have do adopt their way of doing things.

  • The learning curve with Lua is much less steep; you can be productive very quickly.

  • Lua is dead easy to install and it's easy to get information about it. AST has its own quirky installation process shared by nobody else in the world; it's often hard to make the installation work; and it's harder to get information about it.

  • Using Lua has a lot of other payoffs, so the effort spent learning Lua and learning how to incorporate Lua into C codes is easy to amortize over multiple projects.

Norman Ramsey
I don't see how embedding a scripting language solves the problem of embedding arbitrary file data, and if I was going to embed a scripting language I'd probably go with Python (since I've been using Boost.Python a lot recently anyway). sfio looks handy, but looks like it probably wouldn't be worth the effort refactoring everything use it.
cibyr
@cibyr: My asnwer bears on what to embed, not how to embed it. How to embed a file as C code is very, very easy; all the thought goes into the question of what API you want for the bytes. If you look in http://www.cs.tufts.edu/~nr/drop/lua you will see a script called `lua2c` that may give you some ideas. Run with the `-s` option and it will embed any binary file, although it's not super useful since the API I generated tries to compile the file as Lua code. But once you decide what API you want I'm sure you can do something in Python. If not, post another question.
Norman Ramsey
+9  A: 

It's actually quite simple on Linux and other ELF systems: http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967

OS X has bundles, so you just build your library as a framework and put the file in the bundle.

Andrew McGregor
Does this approach work for building Windows DLLs as well? It seems like a good, simple option.
cibyr
I have no idea, I've never built anything for Windows and know nothing about how the linker works.
Andrew McGregor
You're still left with the problem that the API for access to data in a file is different from the API for access to a named block in memory. But if you can live with that, you don't need `objcopy`; you can easily write a short script that generates a .c file from any binary file. E.g., http://www.cs.tufts.edu/~nr/drop/lua/lua2c; run the script with the -s option.
Norman Ramsey
@Norman: The downside to that is the extra binary→escaped C→binary steps take time and memory; this becomes noticeable with big binary blobs. (Not that you should be embedding them; my preference in all cases would be to stick stuff in `$PREFIX/share` but apparently OP doesn't like that too much.)
ephemient
The `$PREFIX/share` solution is fine for UNIX where everybody is used to installing dependencies, but on Windows people really just want a DLL they can redistribute with their app (and there isn't a good equivalent of `$PREFIX/share` anyway).
cibyr
cibyr
Just thought I should report back that this works perfectly with MinGW32, the only difference being that you have to omit the underscore on the front of your variable in the C(++) code. I achieved this with an `#ifdef WIN32` macro, but you could probably make it even more portable by testing with/with the underscore in your configure script (I'm using CMake). If I could edit this answer I'd include more details (what I put in CMakeLists.txt, constructing `stringstream`s from the embedded data, etc).
cibyr