The primary distinction in machine learning is between "supervised" and "unsupervised" modeling.
Usually the words "statistical classification" refer to supervised models, but not always.
With supervised methods the training set contains a "ground-truth" label that you build a model to predict. When you evaluate the model, the goal is to predict the best guess at (or probability distribution of) the true label, which you will not have at time of evaluation. Often there's a performance metric and it's quite clear what the right vs wrong answer is.
Unsupervised classification methods attempt to cluster a large number of data points which may appear to vary in complicated ways into a smaller number of "similar" categories. Data in each category ought to be similar in some kind of 'interesting' or 'deep' way. Since there is no "ground truth" you can't evaluate 'right or wrong', but 'more' vs 'less' interesting or useful.
Similarly evaluation time you can place new examples into potentially one of the clusters (crisp classification) or give some kind of weighting quantifying how similar or different looks like the "archetype" of the cluster.
So in some ways supervised and unsupervised models can yield something which is a "prediction", prediction of class/cluster label, but they are intrinsically different.
Often the goal of an unsupervised model is to provide more intelligent and powerfully compact inputs for a subsequent supervised model.