views:

129

answers:

1

I am trying to convert a rich text string to plain text or html. I am currently using the RichTextBox.Text feature which works correctly for almost all cases except when the text contains backslashes then some of the text is stripped out as the converter believes that it is part of the rtf formatting. Does anyone have any ideas of how to get the backslashes to stay in that instance. Here is an example of a string I would have

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\viewkind4\uc1\pard\fs17 Testing Export \with comments\par}

The text I would need would be "Testing Export \with comments" and the text I am getting back from the rtf converter is "Testing Export comments". Any help would be greatly appreciated. Please respond if you have further questions.

A: 

I think the converter is right. A real backslash in text in RTF should be escaped (eg. to \\). What you have been given is, I believe, not valid RTF at all.

Whist you could try fixing it up by doing a regex replace over the input to double-up any backslashes that were not part of valid control words, this seems very fragile and will go wrong if someone adds a sequence to the text that is a valid control word. The only way to be safe would be to fix whatever is producing the RTF to escape backslashes properly.

bobince
That is what I was afraid of. Unfortunately I cannot fix what is creating the data as this is all old legacy data that we are trying to convert. Thank you for the help.
Allison