views:

37

answers:

2

I would like to implement a functionality that insert a word-breaking TAG if a word is too long to appear in a single line.

    protected string InstertWBRTags(string text, int interval)
{
    if (String.IsNullOrEmpty(text) || interval < 1 || text.Length < interval)
    {
        return text;
    }
    int pS = 0, pE = 0, tLength = text.Length;
    StringBuilder sb = new StringBuilder(tLength * 2);

    while (pS < tLength)
    {
        pE = pS + interval;
        if (pE > tLength)
            sb.Append(text.Substring(pS));
        else
        {
            sb.Append(text.Substring(pS, pE - pS));
            sb.Append("&#8203;");//<wbr> not supported by IE 8
        }
        pS = pE;
    }
    return sb.ToString();
}

The problem is: What can I do, if the text contains html-encoded special chars? What can I do to prevent insertion of a TAG inside a &szlig;? What can I do to count the real string length (that appears in browser)? A string like &#9825;&#9829;♡♥ contains only 2 chars (hearts) in browser but its length is 14.

A: 

You need to pass through whole text character by character, when you find a & than you examine what is next, if you reach a # it is quite sure that after this till a column will be a set of number (you can check it also). I such situation you move your iterator to the position of nearest semicolon and increment the counter.

In Java dialect

int count = 0;

        for(int i = 0; i < text.length(); i++) {

            if(text.charAt(i) == '&') {
                i  = text.indexOf(';', i) + 1; // what, from
            }

            count++;

        }

Very simplified version

Vash
+1  A: 

One solution would be to decode the entities into the Unicode characters they represent and work with that. To do that use System.Net.WebUtility.HtmlDecode() if you're in .NET 4 or System.Web.HttpUtility.HtmlDecode() otherwise.

But be aware that not all Unicode character fit in one char.

svick
The `HtmlEncode` and `HtmlDecode` methods aren't symmetrical; decoding will convert the entities into single characters, but encoding won't convert all of these characters back into entities. Also, if the source text contains characters such as `<` and entities such as `<`, then there's no way of distinguishing those after decoding.
Niels van der Rest
I meant that he shouldn't use `HtmlDecode` at all. But that would require the output to be Unicode.
svick
It works perfectly. Characters like < are forbidden.
Lord Vader