views:

29

answers:

1

I have a text file(UTF-8) file. Content of this file is extracted form rich text documents, it might be MS Word, PDF, HTML or any thing. I have to pass this content to a web service, but most of time it contain invalid characters like form feed or null. What happens now is when I pass the content of the file, containing invalid character, to the web service it throw exception (not a valid XML character).

As I found few characters that are not valid for XML but can I have a proper .NET function the clean the string and remove all invalid characters or can I have a list of Invalid characters for any authentic site.

Thanks for your help in advance.

A: 

Probably the best way is to encode the whole text in Base64 as example.

http://en.wikipedia.org/wiki/Base64

Regards,

ykatchou
Thanks for your replay but can i keep things in UTF-8 and just clean it, using regular expression or what ever. Better if there in a build in function else i am also happy with writing my our function but for that i need list of invalid characters.
Sakhawat Ali