ansaurus

Question

Answer 1

+1 A:

Cutting the html down to size isn't a good idea because as you've stated you end up messing up the valid html. Instead, what you're wanting to do is cut down the size of the text description. To do that you'll need to extract the text you want to display and then cut it down to the size you want....

On the other hand, why not have whatever is generating the html first limit the size of the text to begin with. That way you don't need to worry about getting the text out of the html and cutting it down.

that said, it's kind of difficult to say anymore without a code sample...

mezoid 2009-10-23 04:52:00

Answer 2

A:

It can be done (and I have done it), but it still leaves potential for oddly-rendered-markup, especially when CSS styles are applied. When I wrote it, I did so in Javascript, but the same approach and still be used and involves working with the DOM and not a String.

As you can see, it simply goes through and counts the found text. Once the limit is reached it truncates any remaining text in the node (adding ellipses as desired) and then stops processing further child nodes and removes all subsequent uncles and great uncles, etc in any parents or grandparents, etc. This could (and arguably should) be adapted to use a non-mutating approach.

You are free to use any ideas/strategies/code from below you see fit.

/*
    Given a DOM Node truncate the contained text at a certain length.
    The truncation happens in a depth-first manner.

    Any elements that exist past the exceeded length are removed
    (this includes all future children, siblings, cousins and whatever else)
    and the text in the element in which the exceed happens is truncated.

    NOTES:
    - This modifieds the original node.
    - This only supports ELEMENT and TEXT node types (other types are ignored)

    This function return true if the limit was reached.
*/
truncateNode : function (rootNode, limit, ellipses) {
    if (arguments.length < 3) {
        ellipses = "..."
    }

    // returns the length found so far.
    // if found >= limit then all FUTURE nodes should be removed
    function truncate (node, found) {
        var ELEMENT_NODE = 1
        var TEXT_NODE = 3

        switch (node.nodeType) {
            case ELEMENT_NODE:
                var child = node.firstChild
                while (child) {
                    found = truncate(child, found)
                    if (found >= limit) {
                        // remove all FUTURE elements
                        while (child.nextSibling) {
                            child.parentNode.removeChild(child.nextSibling)
                        }
                    }
                    child = child.nextSibling
                }
                return found
            case TEXT_NODE:
                var remaining = limit - found
                if (node.nodeValue.length < remaining) {
                    // still room for more (at least one more letter)
                    return found + node.nodeValue.length
                }
                node.nodeValue = node.nodeValue.substr(0, remaining) + ellipses
                return limit
            default:
                // no nothing
        }
    }

    return truncate(rootNode, 0)    
},

Well, I really must be bored. Here it is in C#. Almost the same. Still should be updated to be non-mutative. Exercise to the reader, blah, blah...

class Util
{

    public static string
    LazyWrapper (string html, int limit) {
        var d = new XmlDocument();
        d.InnerXml = html;
        var e = d.FirstChild;
        Truncate(e, limit);
        return d.InnerXml;
    }

    public static void
    Truncate(XmlNode node, int limit) {
        TruncateHelper(node, limit, 0);
    }

    public static int
    TruncateHelper(XmlNode node, int limit, int found) {
        switch (node.NodeType) {
        case XmlNodeType.Element:
            var child = node.FirstChild;
            while (child != null) {
                found = TruncateHelper(child, limit, found);
                if (found >= limit) {
                    // remove all FUTURE elements
                    while (child.NextSibling != null) {
                        child.ParentNode.RemoveChild(child.NextSibling);
                    }
                }
                child = child.NextSibling;
            }
            return found;
        case XmlNodeType.Text:
            var remaining = limit - found;
            if (node.Value.Length < remaining) {
                // still room for more (at least one more letter)
                return found + node.Value.Length;
            }
            node.Value = node.Value.Substring(0, remaining);
            return limit;
        default:
            return found;
        }
    }

}

Usage and result:

Util.LazyWrapper(@"<p class=""abc-class"">01<x/>23456789<y/></p>", 5)
// => <p class="abc-class">01<x />234</p>

pst 2009-10-23 05:08:25

What I really want is a C# solution but anyway thanks for your solution.

ldsenow 2009-10-23 05:15:16

Answer 3

A:

For example:

the following html text is the description <p class="abc-class">0123456789</p>

If I wanna display max 5 chars, the result I wanna to see is <p class="abc-class">01234</p>

so what you're gonna do to get the correct.

PS: remember this is a simplest situation.

ldsenow 2009-10-23 05:09:11

Opt to delete. See my reply.

pst 2009-10-23 06:12:37

Answer 4

A:

i would do like this:

  string value = "<p class=\"abc-class\">0123456789</p>";
  char[] delimiters = new char[] { '<', '>' };
    string[] parts = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
  string value2 = parts[1].ToString();
  //
  // here you do what you want to value2
  //

  Console.WriteLine(delimiters[0]+parts[0]+delimiters[1]+value2+delimiters[0]+parts[2]+delimiters[1]);
  Console.WriteLine(value);

you split your string and you work on the part you are interested in, then you build it again, maybe you can recycle this snippet more times.

splitting the string in this way is faster that using string.split(' ')

hope it fits your needs!

Sunrising 2009-10-23 07:40:54

Answer 5

A:

Hi,

But you are generating the description from somewhere, or do you receive the whole html from another source. If you are generating the product description, I think you should do your trimming before stuffing it up in the html befroe returning it.

Your question sisn't explicitly state that you get the html like that from another source, that's why I believe the above suggestion is the easiest solution

Gboyega S. 2009-10-23 08:16:22

ansaurus

tags:

views:

answers:

How to shrink html string size

related questions