views:

248

answers:

3

I'm writing some quick code to try and extract data from an mp3 file header.

The objective is to extract information from the header such as the bitrate and other vital information so that I can appropriately stream the file to a mp3decoder with the necessary arguments.

Here is a wikipedia image showing the mp3header information: http://upload.wikimedia.org/wikipedia/commons/0/01/Mp3filestructure.svg

My question is, am I attacking this correctly? Printing the data received is worthless -- I just get a bunch of random characters. I need to get to the binary so that I can decode it and determine vital information.

Here is my baseline code:

// mp3 Header File IO.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include "stdio.h"
#include "string.h"
#include "stdlib.h"

// Main function
int main (void)
{
    // Declare variables
    FILE *mp3file;
    char *mp3syncword; // we will need to allocate memory to this!!
    char requestedFile[255] = "";
    unsigned long fileLength;

    // Counters
    int i;

    // Memory allocation with malloc
    mp3syncword=(char *)malloc(2000);

    // Let's get the name of the requested file (hard-coded for now)
    strcpy(requestedFile,"testmp3.mp3");

    // Open the file with mode read, binary
    mp3file = fopen(requestedFile, "rb"); 
    if (!mp3file){
         // If we can't find the file, notify the user of the problem
         printf("Not found!");
    }

    // Let's get some header data from the file
    fseek(mp3file,1,SEEK_SET);
    fread(mp3syncword,32,1,mp3file);

    // For debug purposes, lets print the received data
     for(i = 0; i < 32; ++i)
        printf("%c", ((char *)mp3syncword)[i]);
    enter code here
    return 0;
}

Help appreciated.

+1  A: 

You are printing the bytes out using %c as the format specifier. You need to use an unsigned numeric format specifier (e.g. %u for a decimal number or %x or %X for hexadecimal) to print the byte values.

You should also declare your byte arrays as unsigned char as they are signed by default on Windows.

You might also want to print out a space (or other separator) after each byte value to make the output clearer.

The standard printf does not provide a binary representation type specifier. Some implementations do have this but the version supplied with Visual Studio does not. In order to output this you will need to perform bit operations on the number to extract the individual bits and print each of them in turn for each byte. For example:

unsigned char byte = // Read from file
unsigned char mask = 1; // Bit mask
unsigned char bits[8];

// Extract the bits
for (int i = 0; i < 8; i++) {
    // Mask each bit in the byte and store it
    bits[i] = (byte & (mask << i)) >> i;
}

// The bits array now contains eight 1 or 0 values
// bits[0] contains the least significant bit
// bits[7] contains the most significant bit
Matthew Murdoch
Changing to a %d in the printf statement does give me a bunch of numerical values -- however, I doubt if they are binary68513000147118658073670083-33000030-1-40-1-32016747073There are dashes within the result. This doesn't appear correct.I have considered saving the data to something other then a char, but I don't believe that would work as this is binary..
BSchlinker
Oh -- and of course, it can't be binary! The values are other then 1 and 0.
BSchlinker
Apologies, you should be treating them as `unsigned` numbers (that's why you are seeing the dashes - minus signs).
Matthew Murdoch
Oh... I think I understand. You want to print out the binary representation of the byte, right?
Matthew Murdoch
Well, I need to parse the first 32 bits of the file in some manner, and then be able to go in for instance and analyze bits 17-20 to determine what the bit rate is. The graphic from Wikipedia helps.You're correct that changing it to unsigned numbers helps -- but I am not getting a binary value. Honestly, I will take the first 32 bits in binary, or hex, or any format.
BSchlinker
I don't need to even be able to print out the binary representation -- I just need to be able to parse it.
BSchlinker
Are you comfortable with bit operations to extract the individual bits?
Matthew Murdoch
I'm afraid I'm unfamiliar with that operation. Any resources you can point me to would be very helpful.
BSchlinker
Updated answer with an example.
Matthew Murdoch
I still don't seem to be getting a true binary interpretation unsigned char byte = 49;// Read from file unsigned char mask = 1; // Bit mask unsigned char bits[8]; // Extract the bitsfor (int i = 0; i < 8; i++) { // Mask each bit in the byte and store it bits[i] = byte }// For debug purposes, lets print the received datafor (int i = 0; i < 8; i++) { printf("Bit: %d\n",bits[i]);}Prints:Bit: 1Bit: 0Bit: 0Bit: 0Bit: 16Bit: 32Bit: 0Bit: 0
BSchlinker
My mistake. I've updated the code. Thanks.
Matthew Murdoch
+1  A: 

C does not have a printf() specifier to print in binary. Most people print in hex instead, which will give you (typically) eight bits at a time:

printf("the first eight bits are %02x\n", (unsigned char) mp3syncword[0]);

You will need to interpret this manually to figure out the values of individual bits. The cast to unsigned char on the argument is to avoid surprises if it's negative.

To test bits, you can use use the & operator together with the bitwise left shift operator, <<:

if(mp3syncword[2] & (1 << 2))
{
  /* The third bit from the right of the third byte was set. */
}

If you want to be able to use "big" (larger than 7) indexes for bits, i.e. treat the data as a 32-bit word, it might be good to read it into e.g. an unsigned int, and then inspect that. Be careful with endian-ness when you do this reading, however.

unwind
A: 

Warning: there are probably errors with memory layout and/or endianess with this approach. It is not guaranteed that the struct members match the same bits from computer to computer.
In short: don't rely on this (I'll leave the answer, it might be useful for something else)

You can define a struct with bit fields:

struct MP3Header {
    unsigned SyncWord : 12;
    unsigned Version : 1;
    unsigned Layer : 2;
    unsigned ErrorProtection : 1;
    unsigned BitRate : 4;
    unsigned Frequency : 2;
    unsigned PadBit : 1;
    unsigned PrivBit : 1;
    unsigned Mode : 2;
    unsigned ModeExtension : 2;
    unsigned Copy : 1;
    unsigned Original : 1;
    unsigned Emphasis : 2;
};

and then use each member as an isolated value:

struct MP3Header h;
/* ... */
fread(&h, sizeof h, 1, mp3file); /* error check!! */
printf("Frequency: %u\n", h.Frequency);
pmg
This is pretty dangerous; you don't know that the bitfields will be layed out in memory to match the file format. You also depend on endianness, and might be bitten by padding.
unwind
@unwind: you're right about memory layout and endianness, but not padding (the only thing I thought about when I answered) -- see 6.7.2.1/10 in the Standard. Thank you for the warning.
pmg