Existing app passes XML to a sproc in SQLServer 2000, input parameter data type is TEXT; The XML is derived from Dataset.GetXML(). But I notice it doesn't specify an encoding.
So when the user sneaks in an inappropriate character into the dataset, specifically ASCII 146 (which appears to be an apostrophe) instead of ASCII 39 (single quote), the sproc fails.
One approach is to prefix the result of GetXML with
<?xml version="1.0" encoding="ISO-8859-1"?>
It works in this case, but what would be a more correct approach to ensure the sproc does not crash (if other unforeseen characters pop up)?
PS. I suspect the user is typing text into MS-Word or similar editor, and copy & pasting into the input fields of the app; I would probably want to allow the user to continue working this way, just need to prevent the crashes.
EDIT: I am looking for answers that confirm or deny a few aspects, For example:
- as per title, whats the default encoding if none specified in the XML?
- Is the encoding ISO-8859-1 the right one to use?
- if there a better encoding that would encompass more characters in the english-speaking world and thus less likely to cause an error in the sproc?
- would you filter at the app's UI level for standard ASCII (0 to 127 only), and not allow extended ASCII?
- any other pertinent details.