Well you can use imagettfbox (see also http://ruquay.com/sandbox/imagettf/) to get the canvas boundaries of the tag text you created - given a font, rotation and size (which obviously depends on the number of occurrences of any given tag).
From that point you can start placing the tag words (randomly? - see edit) in the cloud canvas, until all of them are placed. You just have to make sure they don't overlap (ie you can store the pixel coordinates in a array).
One other thing that you need to make sure is to make the image canvas big enough (or the font size small enough) to accommodate all the tags, so you'll need to precalculate (again, using imagettfbox) the exact pixel size of each word and after you've reached a size where all the words can fit inside the image canvas you can start placing them using imagettftext.
EDIT: You can also learn a lot (and maybe contact the developer) by taking a look at the credits, for example:
Thank you, Martin Wattenberg, for the
central idea of just throwing stuff at
the screen until it fits. I raise my
glass to the philosophy of "the
dumbest possible thing that works."
And much more...