Strictly speaking, the intrinsic representation of an enum shouldn't matter, because by definition, they are enumerated types. What this means is that
public enum PrimaryColor { Red, Blue, Yellow }
represents a set of values.
Firstly, some sets are smaller, whereas other sets are larger. Therefore, the .NET CLR allows one to base an enum on an integral type, so that the domain size for enumerated values can be increased or decreased, i.e., if an enum was based on a byte, then that enum cannot contain more than 256 distinct values, whereas one based on a long can contain 2^64 distinct values. This is enabled by the fact that a long is 8 times larger than a byte.
Secondly, an added benefit of restricting the base type of enums to integral values is that one can perform bitwise operations on enum values, as well as create bitmaps of them to represent more than one values.
Finally, integral types are the most efficient data types available inside a computer, therefore, there is a performance advantage when it comes to comparing different enum values.
For the most part, I would say representing enums by integral types seems to be a CLR and/or CLS design choice, though one that is probably not very difficult to arrive at.