views:

348

answers:

2

I have a database of items with XHTML content and I want to display the items with the HTML stripped off (done) and then truncate each item to a maximum length of 100 characters. If the string exceeds 100 characters, I cut it off and insert … (an ellipsis) at the end.

The problem is that my program doesn't understand HTML entities that are already in the string. E.g. if the string is something & something, my function may truncate it as something &am... resulting in invalid XHTML.

What is the best way to go about this problem in ASP.NET/C#?

+3  A: 

You could use HtmlDecode to convert html entities to normal string, then truncate this string and finally encode the result:

var decoded = HttpUtility.HtmlDecode(theEncodedString);
decoded = Truncate(decoded);
var result = HttpUtility.HtmlEncode(decoded);
Darin Dimitrov
A: 

You could use a regular expression to match either an HTML entity or a single character, and repeat up to the length that you want. Something like:

^(&\w+;|.){,100}
Guffa