views:

411

answers:

3

I am creating a C++ program that will read a .docx's plain text. My plan of attack is to rename the .docx as a .zip and then unzip. I then will rename the .xml file containing the text of the document as a .txt and parse it out.

Right now I have figured out the renaming which was easy enough. I am now struggling with unzipping. I am very proficient in C++, but this is my first time I have been extending myself to real word applications and using it beyond the STL library.

At first I tried many wrappers for C++ from the zlib library, but have not been able to get any of them to compile or work properly (it may be due to environment being in Cygwin). For that reason it seems I have to default to using the messy zlib code to do this. But from all the documentation and examples I can find it only shows zlib being used to read a .zip that is a compression of one file not multiple files. I now don't know where to go from here and, like I said earlier, being completely new to the domain outside of STL I am feeling quite lost.

Any help or guidance is much appreciated!

Thanks, Michael

A: 

zlib is for GZip compression, not ZIP compression (see here for details).

As a result you'd perhaps be better to shell out to the unzip utility provided in Cygwin and available for lots of platforms.

Mike McQuaid
I'm not sure what you mean by the first question. I can of course use C code if that's what your implying. Could you provide some more detail on how I would use the "unzip" executable in Cygwin (though I would prefer the code to be portable if this would make it not so).Also I'm pretty sure zlib can do both zip and gzip compression/uncompression.
mcFreid
I've cleared up the first question and added a link to show that zlib can't handle zip compression. You'd need to use the minizip library provided with zlib.
Mike McQuaid
Mike,Thanks for the link. I misread what you meant. I thought you were talking about .zip files, not .zip archives/directories. I have tried info-zip as well and had trouble getting it to compile in cygwin. Later on when I am home I'll post the errors I'm getting.
mcFreid
You don't need to compile it on Cygwin, just install the "unzip" tool from the Cygwin installer. That is supplied by Info-Zip.
Mike McQuaid
Hmm, I still don't understand then. How am I supposed to use a command line tool inside my code?
mcFreid
Use one of the exec* functions provided by the standard library.
Mike McQuaid
+1  A: 

I don't think zlib supports multi-file zips directly (could be wrong), so you may want to look for alternatives. As an aside, you might also want to consider switching from cygwin to MinGW, unless you really need the POSIX/UNIX compatibility that cygwin provides.

anon
If by "multi-file" zips you mean zip files containing multiple files, then there's minizip. I'm not entirely clear how minizip relates to zlib (apart from that it requires zlib), but it works. See http://www.winimage.com/zLibDll/minizip.html.
Dominic Rodger
I'm a little confused on how to install minzip. From the site you linked it seems that it now comes with zlib, but I don't see it. Remember here that I am new to using additional libraries with C++.
mcFreid
+1  A: 

I've been dealing with a similar issue, but don't really have a great solution yet.

zlib does not currently support multiple files.

See: http://stackoverflow.com/questions/1224464/c-c-packing-and-compression

Justin
Thanks for the reference. I looked at http://nih.at/libzip/index.html from your question and it seems it may provide my answer. I'm going to try it out as soon as I can.
mcFreid