I have a concatenated file made up of some number of bzip2
archives. I also know the sizes of the individual bzip2
chunks in that file.
I would like to decompress a bzip2
stream from an individual bzip2 data chunk, and write the output to standard output.
First I use fseek
to move the file cursor to the desired archive byte, and then read the "size"-chunk of the file into a BZ2_bzRead
call:
int headerSize = 1234;
int firstChunkSize = 123456;
FILE *fp = fopen("pathToConcatenatedFile", "r+b");
char *bzBuf = malloc(sizeof(char) * firstChunkSize);
int bzError, bzNBuf;
BZFILE *bzFp = BZ2_bzReadOpen(&bzError, *fp, 0, 0, NULL, 0);
# move cursor past header of known size, to the first bzip2 "chunk"
fseek(*fp, headerSize, SEEK_SET);
while (bzError != BZ_STREAM_END) {
# read the first chunk of known size, decompress it
bzNBuf = BZ2_bzRead(&bzError, bzFp, bzBuf, firstChunkSize);
fprintf(stdout, bzBuf);
}
BZ2_bzReadClose(&bzError, bzFp);
free(bzBuf);
fclose(fp);
The problem is that when I compare the output of the fprintf
statement with output from running bzip2
on the command line, I get two different answers.
Specifically, I get less output from this code than from running bzip2
on the command line.
More specifically, my output from this code is a smaller subset of the output from the command line process, and I am missing what is in the tail-end of the bzip2 chunk of interest.
I have verified through another technique that the command-line bzip2
is providing the correct answer, and, therefore, some problem with my C code is causing output at the end of the chunk to go missing. I just don't know what that problem is.
If you are familiar with bzip2
or libbzip2
, can you provide any advice on what I am doing wrong in the code sample above? Thank you for your advice.