There are the standard A-Z, a-z characters, but also there are hyphens, em dashes, quotes, etc.
Plus, there are all of the international characters, like umlauts, etc.
So, for an English-based system, what's the complete set? What about sets for other languages? What about UTF8, UTF16, etc?
Bonus question: How many name fields are needed, and what are their maximum lengths?
EDIT: There are definitely two different types of characters involved in people's names, those that are there as part of the context, and those that are there for structural reasons. I don't want to limit or interfere with the context characters, but I do need to deal with the structural ones.
For example, I had a name come in that was separated by an em dash, but it was hard to distinguish that from the minus character. To make the system easier for searching, I want to take all five different types of dashes, and map them onto one unique character (minus), that way the searcher doesn't need to know specifically which symbol was initially entered.
The problem exists for dashes, probably quotes as well, but also how many other symbols?
Paul.