Check out Winamp's AVS - Advanced Visualization Studio. It is similar (though probably considered inferior) to Milkdrop. What's good about it, though, is that you can make your own visualizations. You can also take apart those made by someone else, e.g. the ones that come with Winamp.
Note that I have not used Winamp beyond version 2 and have seen Milkdrop only a couple of times. But I think AVS should get you started - assuming they still ship it with Winamp 5.
Here's how AVS works, simply: You have two types of components, "input" and... let's say, "filter".
Input components are responsible for drawing an initial image on the screen. They are based on time-varying data, such as the spectrum or waveform of the current sound sample being played.
Filter components are the ones that do all the "fun" work. They work on and distort the image generated by the input components. Some examples include pixel modification using rectangular coordinates (y=y, x=x+1 --> shift everything to the right by 1 pixel) and polar coordinates (r=r, θ=θ+1 --> rotate everything by 1 degree). Most of these effects make sense if you don't clear the screen every frame - then they add up and you get rotating, twirling, shifting images. Of course, in order not to overwhelm the screen, you need to gradually fade out old frames. There may be a "blur" or "fade out" filter component for that.
On top of that, you can group these components in layers and specify how they blend together. (Think of the layer blending modes in Photoshop, or just look up "blend mode.")