The first step is to uncompress the mp3. As long as you're doing this as a batch job, rather than use LAME as a library, just use an existing command-line program to convert the mp3 to a temporary WAV file, that will be much easier. Then find a library to read WAV files - it's a relatively simple format and you should find lots of sample code online, or you could write your own in an afternoon.
Suppose your song is 60 minutes long: 60 minutes * 60 seconds/minute * 44100 samples/second = 158,760,000 samples. (Twice that if it's a stereo song.) If your image is 1000 pixels wide, you only want to display one sample for every 158,760 samples.
(As an aside, you won't see much detail at that resolution. Perhaps a better solution would be to show a waveform of just the first 5 minutes, or render a larger image that the user can scroll?)
Anyway, you want to read the audio samples for each block of 158,760 samples (in this example), and render it as a vertical line representing the strength of the signal over that portion of the audio. There are two ways to do this:
- The maximum value over that region
- The root-mean-squared (RMS) value over that region
Maximum will show you peaks, while RMS will show you the overall perceived loudness. Both are easy to implement; try both and see which one looks best.
Then you just need to turn that resulting image into a gif. Since this is a batch job anyway, if I were you, I would write out a BMP file (a really easy file format) and then use a command-line program like ImageMagick's "convert" to turn that into a GIF.
Finally, one last note: if you're really tricky, you could read the MP3 frames and extract the gain directly from the bitstream without decoding the whole thing. That's what I did here, and you're welcome to use it - but it's not for the faint of heart. It's roughly 100x faster than decoding the full MP3, but the waveform you get will be a crude approximation.