The average case is indeed O( N N!):
Observe that there are exactly N! permutations of N elements. The probability of picking the right one is exactly 1/N!. Hence, by the strong law of large numbers, the expected number of sorts is N!.
Where does the other factor N come from? You have to check, at every step, what permutation you picked. That can be done linearly be comparing adjacent elements. Hence an extra factor N.
Comments above have indicated that O(g(n)) notation is "worst case":
1) That's not true. The definition of O(g(n)) is:
f(n) is O(g(n)) if there exists some c,d such that f(n) < c * g(n) + d for sufficiently large n. There's nothing about "worst case". It just so happens that g(n) is a bigger function than f(n), but the pure mathematical definition says nothing about "case".
2) For randomized algorithms, it's pointless to do a "worst case" analysis anyway. You can come up with some execution that's going to be really bad.
3) The really bad executions occur on a set of measure 0 (A probabilist would say that they "almost surely" don't happen). They're actually impossible to observe.