A character set is just that - a set of characters. Here's a set of three characters:
- LATIN CAPITAL LETTER A
- LATIN CAPITAL LETTER B
- LATIN CAPITAL LETTER C
Unicode (the set of all* characters) calls each of these things a code point and assigns each one a number: U+0041, U+0042, U+0043. Go see the PDF charts for the assignments.
A character encoding maps these code points to the numerical byte sequences used in RAM or on disc. Anywhere characters are used, they need to be in an encoding of some form. The number of bytes used to encode each character varies (usually between 1 and 4). Different encodings use different sequences of bytes for their mappings. You can use a utility like this one to inspect the mappings.
The thing you see on the screen is a grapheme from a graphical font. It may be made up of more than one code point.
Back in olden times, a character set and a character encoding were pretty much the same thing and anyone who wanted their data to work on a computer in another country had major headaches. The Windows "ANSI" encoding 1252 uses a single byte for each character and can only support 256 values. The development of the Unicode standard separated the concept character sets and encodings. Unicode is supported by multiple encodings (Unicode Transformation Formats) and has room for over a million characters.
Some examples of the byte representations of various characters in different encodings (where they're supported):
Grapheme: A
Code point: U+0041 LATIN CAPITAL LETTER A
ASCII 41
Windows-1252 41
ISO-8859-15 41
UTF-8 41
UTF-16BE 00 41
Grapheme: €
Code point: U+20AC EURO SIGN
ASCII -
Windows-1252 80
ISO-8859-15 A4
UTF-8 E2 82 AC
UTF-16BE 20 AC
Grapheme: 𝔊
Code point: U+1D50A MATHEMATICAL FRAKTUR CAPITAL G
ASCII -
Windows-1252 -
ISO-8859-15 -
UTF-8 F0 9D 94 8A
UTF-16BE D8 35 DD 0A
Grapheme: é
Code points: U+0065 LATIN SMALL LETTER E U+0301 COMBINING ACUTE ACCENT
ASCII 65 - (doesn't support the combining accent)
Windows-1252 65 - (doesn't support the combining accent)
ISO-8859-15 65 - (doesn't support the combining accent)
UTF-8 65 CC 81
UTF-16BE 00 65 03 01
Most of the issues with character sets are when a programmer:
- doesn't know when or how to transform from one set of mappings to another
- chooses the wrong mapping
- chooses a mapping that results in data loss
- doesn't see that such transformations are being made by a library or tool
*OK, not all characters, but a lot.
You'll have to forgive any historical inaccuracies on my part - I know there are/were rival encodings to Unicode and I haven't done any research on who thought up what when. I recently wrote a post comparing character handling in different languages if you want to see some specifics.