Ok - there are some very good answers here but almost all of them seem to make the same mistake and it's one that is pervading common usage.
Informally, we write that f(n) = O( g(n) ) if, up to a scaling factor and for all n larger than some n0, g(n) is larger than f(n). That is, f(n) grows no quicker than, or is bounded from above by, g(n). This tells us nothing about how fast f(n) grows, save for the fact that it is guaranteed not to be any worse than g(n).
A concrete example: n = O( 2^n ). We all know that n grows much less quickly than 2^n, so that entitles us to say that it is bounded by above by the exponential function. There is a lot of room between n and 2^n, so it's not a very tight bound, but it's still a legitimate bound.
Why do we (computer scientists) use bounds rather than being exact? Because a) bounds are often easier to prove and b) it gives us a short-hand to express properties of algorithms. If I say that my new algorithm is O(n.log n) that means that in the worst case its run-time will be bounded from above by n.log n on n inputs, for large enough n (although see my comments below on when I might not mean worst-case).
If instead, we want to say that a function grows exactly as quickly as some other function, we use theta to make that point (I'll write T( f(n) ) to mean \Theta of f(n) in markdown). T( g(n) ) is short hand for being bounded from above and below by g(n), again, up to a scaling factor and asymptotically.
That is f(n) = T( g(n) ) <=> f(n) = O(g(n)) and g(n) = O(f(n)). In our example, we can see that n != T( 2^n ) because 2^n != O(n).
Why get concerned about this? Because in your question you write 'would someone have to smoke crack to write an O(x!)?' The answer is no - because basically everything you write will be bounded from above by the factorial function. The run time of quicksort is O(n!) - it's just not a tight bound.
There's also another dimension of subtlety here. Typically we are talking about the worst case input when we use O( g(n) ) notation, so that we are making a compound statement: in the worst case running time it will not be any worse than an algorithm that takes g(n) steps, again modulo scaling and for large enough n. But sometimes we want to talk about the running time of the average and even best cases.
Vanilla quicksort is, as ever, a good example. It's T( n^2 ) in the worst case (it will actually take at least n^2 steps, but not significantly more), but T(n.log n) in the average case, which is to say the expected number of steps is proportional to n.log n. In the best case it is also T(n.log n) - but you could improve that for, by example, checking if the array was already sorted in which case the best case running time would be T( n ).
How does this relate to your question about the practical realisations of these bounds? Well, unfortunately, O( ) notation hides constants which real-world implementations have to deal with. So although we can say that, for example, for a T(n^2) operation we have to visit every possible pair of elements, we don't know how many times we have to visit them (except that it's not a function of n). So we could have to visit every pair 10 times, or 10^10 times, and the T(n^2) statement makes no distinction. Lower order functions are also hidden - we could have to visit every pair of elements once, and every individual element 100 times, because n^2 + 100n = T(n^2). The idea behind O( ) notation is that for large enough n, this doesn't matter at all because n^2 gets so much larger than 100n that we don't even notice the impact of 100n on the running time. However, we often deal with 'sufficiently small' n such that constant factors and so on make a real, significant difference.
For example, quicksort (average cost T(n.log n)) and heapsort (average cost T(n.log n)) are both sorting algorithms with the same average cost - yet quicksort is typically much faster than heapsort. This is because heapsort does a few more comparisons per element than quicksort.
This is not to say that O( ) notation is useless, just imprecise. It's quite a blunt tool to wield for small n.
(As a final note to this treatise, remember that O( ) notation just describes the growth of any function - it doesn't necessarily have to be time, it could be memory, messages exchanged in a distributed system or number of CPUs required for a parallel algorithm.)