I've used MS Word automation to save a .doc to a .htm. If there are bullet characters in the .doc file, they are saved fine to the .htm, but when I try to read the .htm file into a string (so I can subsequently send to a database for ultimate storage as a string, not a blob), the bullets are converted to question marks or other characters depending on the encoding used to load into a string.
I'm using this to read the text:
string html = File.ReadAllText(myFileSpec);
I've also tried using StreamReader, but get the same results (maybe it's used internally by File.ReadAllText).
I've also tried specifying every type of Encoding in the second overload of File.ReadAllText:
string html = File.ReadAllText(originalFile, Encoding.ASCII);
I've tried all the available enums for the Encoding type.
Any ideas?