views:

75

answers:

1

I'm working on embedding Python 2.6 into an existing c++ application. So far I have the Libraries linked in and am able to successfully initialize the Python Interpreter and can also transfer data to Python. I'm having trouble retrieving it, and hope someone can steer me the right direction. I'm working with this:

Py_Initialize();

pModule = PyImport_ImportModule("cBuffers"); // This crashes after 1st call.
pDict = PyModule_GetDict(pModule);
pClass = PyDict_GetItemString(pDict, "rf_pdf");
pMeth = PyString_FromString("main");

if (PyCallable_Check(pClass) && PyClass_Check(pClass)) {
  pInstance = PyInstance_New(pClass, NULL, NULL);
  pOutput = PyObject_CallMethodObjArgs(pInstance, pMeth, pOpts, pInput, NULL);
}

if (pOutput != NULL) {
  string pPdf = PyString_AsString(pOutput);
  Py_DECREF(pOutput);
} else {
  PyErr_Print();
}

// Cleanup
Py_DECREF(pModule);
Py_DECREF(pModule); // Has an extra reference, not positive why.
Py_DECREF(pMeth);
Py_DECREF(pInstance);
Py_DECREF(pOpts);
Py_DECREF(pInput);

Py_Finalize();

pOpts and pInput are both generated using PyString_FromString earlier in the code. The trouble I'm having is that when I attempt to retrieve the output using PyString_AsString the return value is NUL Terminated. Unfortunately, because I'm generating PDF Documents, NULs are not only allowed, they're almost guaranteed. Can anyone tell me how I return String Data from Python back to C++ without ending at the first NUL it encounters?

As an additional question, This code can be called multiple times as a part of a background service that's creating PDF Documents from incoming Print Data. The first time this code is called into it works as expected. Any subsequent calls fail at the indicated line just after Py_Initialize(). Help on how to determine what's going on there would be most appreciated as well. Thanks in advance,

+1  A: 

A few points:

  • Don't use strings. You might even be able to make them work here with some contortions on *_StringAndSize() functions, but it won't be what you want. You should store your data in a custom data structure (or a buffer) that is just a sequence of bytes (do you really want clients performing string operations on this data in Python?). If your object really is a buffer object, you should use the Buffer API.

  • Your imported module has a refcount of 2 because it's being held in sys.modules (for efficiency for the next time you try to import it). Never decref references you don't own or you'll crash your program. The Importing Modules section of the documentation should really cover this, but it doesn't.

  • It's pretty expensive to initialize Python and tear it down every time you do these operations. You should try to reorganize your use case such that you can call Py_Initialize only once when your application starts (or the first time it needs Python), and then only call Py_Finalize when your application is definitely done with Python, or when it quits.

  • You're being very lazy with error checking - most of the Python C/API functions can return NULL to indicate that an exception has been thrown, and you're almost never checking this value. If something fails you're going to start crashing in very odd places. You can read about this in the Exception Handling section of the C/API manual.

Nick Bastin
g.d.d.c
Nick - Also on 1 - The library that's doing PDF Generation expects a file handle to put data into. I'm using a StringIO Instance in Python to stay off the disk, then when I've completed the PDF Generation I retrieve the buffer contents with `outfile.getvalue()`. I am so far unable to convert this to a bytearray - I get an encoding error.
g.d.d.c
You could implement a very small object in C++ that would look like a file-like object to Python (just by implementing the proper protocol methods) and use that instead of a StringIO instance. That way you can keep the data in memory, but not have to worry about null bytes or API handling, as it'll already be a native C++ data structure when the client returns it to you.
Nick Bastin
Also, for testing purposes I'd use a real file and make sure you can make that work (and make it robust), before complicating the situation by avoiding writing to disk.
Nick Bastin
I have working code that uses real files and does not involve any embedding - the PDF Generation is a stand-alone executable compiled with Py2Exe. Using this approach we have bottlenecks related to Disk I/O. Hence the attempt to keep things in memory. Passing the data in as a String seems to work fine, it's only retrieving it that gives me fits. I've tried converting PyString `pOutput` into a PyByteArray as well. `PyByteArray_Size` returns the correct size, but I still can't get it back to a char * to hand it back to the C++ down stream - I only get about 2044 bytes of the 35000 present.
g.d.d.c
This isn't a problem with Python (and is why you should generally avoid the string API for non-string data on the C side). The problem is that the default std::string constructor stops reading the `char` array at the first `null`, because that's what the spec says it should do. The entire data is *there*, you just need to use the proper constructor: `std::string(const Char *str, size_type length);` (I'm not 100% sure that the C++ spec promises that this conversion will work - it's entirely possible that there is no way to get this data into a C++ string without constructing an iterator)
Nick Bastin
I was able to get this working using the PyString_AsStringAndSize function and then using assign to place the data into the the variable that gets moved downstream. Also, pulling the call to Py_Initialize() out into a global scope so it's not setup and torn down resolved the crash on files after the first. Appreciate the help.
g.d.d.c