ansaurus

Question

How can I enable a word-breaking function by length without split inside html-encoded special chars.

Answer 1

A:

You need to pass through whole text character by character, when you find a & than you examine what is next, if you reach a # it is quite sure that after this till a column will be a set of number (you can check it also). I such situation you move your iterator to the position of nearest semicolon and increment the counter.

In Java dialect

int count = 0;

        for(int i = 0; i < text.length(); i++) {

            if(text.charAt(i) == '&') {
                i  = text.indexOf(';', i) + 1; // what, from
            }

            count++;

        }

Very simplified version

Vash 2010-07-21 14:13:07

Answer 2

+1 A:

One solution would be to decode the entities into the Unicode characters they represent and work with that. To do that use System.Net.WebUtility.HtmlDecode() if you're in .NET 4 or System.Web.HttpUtility.HtmlDecode() otherwise.

But be aware that not all Unicode character fit in one char.

svick 2010-07-21 14:22:31

The `HtmlEncode` and `HtmlDecode` methods aren't symmetrical; decoding will convert the entities into single characters, but encoding won't convert all of these characters back into entities. Also, if the source text contains characters such as `<` and entities such as `<`, then there's no way of distinguishing those after decoding.

Niels van der Rest 2010-07-21 14:31:49

I meant that he shouldn't use `HtmlDecode` at all. But that would require the output to be Unicode.

svick 2010-07-21 15:16:39

It works perfectly. Characters like < are forbidden.

Lord Vader 2010-07-22 10:37:55

ansaurus

tags:

views:

answers:

How can I enable a word-breaking function by length without split inside html-encoded special chars.

related questions