ansaurus

Question

Answer 1

+5 A:

Have you looked at libtar?

From the fink package info:

libtar-1.2-1: Tar file manipulation API libtar is a C library for manipulating POSIX tar files. It handles adding and extracting files to/from a tar archive. libtar offers the following features:
* Flexible API - you can manipulate individual files or just extract a whole archive at once.
* Allows user-specified read() and write() functions, such as zlib's gzread() and gzwrite().
* Supports both POSIX 1003.1-1990 and GNU tar file formats.

Not c++ per se, but you can link to c pretty easily...

dmckee 2010-03-24 02:55:38

The documentation kind of sucks but I'm checking it out..

Brendan Long 2010-03-24 02:58:40

Answer 2

+5 A:

I figured this out myself after a bit of work. The tar file spec actually tells you everything you need to know.

First off, every file starts with a 512 byte header, so you can represent it with a char[512] or a char* pointing at somewhere in your larger char array (if you have the entire file loaded into one array for example).

The header looks like this: location size field 0 100 File name 100 8 File mode 108 8 Owner's numeric user ID 116 8 Group's numeric user ID 124 12 File size in bytes 136 12 Last modification time in numeric Unix time format 148 8 Checksum for header block 156 1 Link indicator (file type) 157 100 Name of linked file

So if you want the file name, you grab it right here with string filename(buffer[0], 100); (The file name is null padded, so you could do a check to make sure there's at least one null and then leave off the size if you want to save space).

Now we want to know if it's a file or a folder. The "link indicator" field has this information, so:

// Note that we're comparing to ascii numbers, not ints
switch(buffer[156]){
  case '0': // intentionally dropping through
  case '\0':
    // normal file
    break;
  case '1':
    // hard link
    break;
  case '2':
    // symbolic link
    break;
  case '3':
    // device file/special file
    break;
  case '4':
    // block device
    break;
  case '5':
    // directory
    break;
  case '6':
    // named pipe
    break;
}

At this point, we already have all of the information we need about directories, but we need one more thing from normal files: the actual file contents. The length of the actual file is stored in ascii octal at 124 (Important note: The spec lies -- there are only 11 digits in the file size, the 12th is something magic). I used my own function for converting this, but it assumes a well formed file:

// in one function
int sizeOfFile = octalStringToInt(&buffer[124], 11);

// elsewhere
int octalStringToInt(char *string, unsigned int size){
  unsigned int output = 0;
  while(size > 0){
    output = output*8 + *string - '0';
    string++;
    size--;
  }
  return output;
}

Ok, so now we have everything except the actual file contents. All we have to do is grab the next size bytes of data from the tar file and we'll have our file contents:

locationtion += 512; // Get to the next block after the header ends
fileContents = new char[size+1]; // Adding 1 since we need space for one null char
memcpy (fileContents, &buffer[location], size );
fileContents[size] = '\0'; // Null terminate our file
location = location + ((size/512) + 1) * 512; // Go to the next block by rounding up to 512

Brendan Long 2010-03-24 21:25:48

ansaurus

tags:

views:

answers:

How to parse a tar file in C++

related questions