The unicode standard has enough code-points in it that you need 4 bytes to store them all. That's what the UTF-32 encoding does. Yet the UTF-8 encoding somehow squeezes these into much smaller spaces by using something called "variable-width encoding".
In fact, it manages to represent the first 127 characters of US-ASCII in just one byte which looks exactly like real ASCII, so you can interpret lots of ascii text as if it were UTF-8 without doing anything to it. Neat trick. So how does it work?
I'm going to ask and answer my own question here because I just did a bit of reading to figure it out and I thought it might save somebody else some time. Plus maybe somebody can correct me if I've got some of it wrong.