It can be done (and I have done it), but it still leaves potential for oddly-rendered-markup, especially when CSS styles are applied. When I wrote it, I did so in Javascript, but the same approach and still be used and involves working with the DOM and not a String.
As you can see, it simply goes through and counts the found text. Once the limit is reached it truncates any remaining text in the node (adding ellipses as desired) and then stops processing further child nodes and removes all subsequent uncles and great uncles, etc in any parents or grandparents, etc. This could (and arguably should) be adapted to use a non-mutating approach.
You are free to use any ideas/strategies/code from below you see fit.
/*
Given a DOM Node truncate the contained text at a certain length.
The truncation happens in a depth-first manner.
Any elements that exist past the exceeded length are removed
(this includes all future children, siblings, cousins and whatever else)
and the text in the element in which the exceed happens is truncated.
NOTES:
- This modifieds the original node.
- This only supports ELEMENT and TEXT node types (other types are ignored)
This function return true if the limit was reached.
*/
truncateNode : function (rootNode, limit, ellipses) {
if (arguments.length < 3) {
ellipses = "..."
}
// returns the length found so far.
// if found >= limit then all FUTURE nodes should be removed
function truncate (node, found) {
var ELEMENT_NODE = 1
var TEXT_NODE = 3
switch (node.nodeType) {
case ELEMENT_NODE:
var child = node.firstChild
while (child) {
found = truncate(child, found)
if (found >= limit) {
// remove all FUTURE elements
while (child.nextSibling) {
child.parentNode.removeChild(child.nextSibling)
}
}
child = child.nextSibling
}
return found
case TEXT_NODE:
var remaining = limit - found
if (node.nodeValue.length < remaining) {
// still room for more (at least one more letter)
return found + node.nodeValue.length
}
node.nodeValue = node.nodeValue.substr(0, remaining) + ellipses
return limit
default:
// no nothing
}
}
return truncate(rootNode, 0)
},
Well, I really must be bored. Here it is in C#. Almost the same. Still should be updated to be non-mutative. Exercise to the reader, blah, blah...
class Util
{
public static string
LazyWrapper (string html, int limit) {
var d = new XmlDocument();
d.InnerXml = html;
var e = d.FirstChild;
Truncate(e, limit);
return d.InnerXml;
}
public static void
Truncate(XmlNode node, int limit) {
TruncateHelper(node, limit, 0);
}
public static int
TruncateHelper(XmlNode node, int limit, int found) {
switch (node.NodeType) {
case XmlNodeType.Element:
var child = node.FirstChild;
while (child != null) {
found = TruncateHelper(child, limit, found);
if (found >= limit) {
// remove all FUTURE elements
while (child.NextSibling != null) {
child.ParentNode.RemoveChild(child.NextSibling);
}
}
child = child.NextSibling;
}
return found;
case XmlNodeType.Text:
var remaining = limit - found;
if (node.Value.Length < remaining) {
// still room for more (at least one more letter)
return found + node.Value.Length;
}
node.Value = node.Value.Substring(0, remaining);
return limit;
default:
return found;
}
}
}
Usage and result:
Util.LazyWrapper(@"<p class=""abc-class"">01<x/>23456789<y/></p>", 5)
// => <p class="abc-class">01<x />234</p>