views:

2354

answers:

7

The .NET web system I'm working on allows the end user to input HTML formatted text in some situations. In some of those places, we want to leave all the tags, but strip off any trailing break tags (but leave any breaks inside the body of the text.)

What's the best way to do this? (I can think of ways to do this, but I'm sure they're not the best.)

+3  A: 

You can use a regex to find and remove the text with the regex match set to anchor at the end of the string.

Mitchel Sellers
+2  A: 

I'm sure this isn't the best way either, but it should work unless you have trailing spaces or something.

while (myHtmlString.EndsWith("<br>"))
{
    myHtmlString = myHtmlString.SubString(0, myHtmlString.Length - 4);
}
Max Schmeling
Keep in mind that <br> isn't xhtml. Its a badly formed tag.
Will
True. So it should probably check for the possiblities <br> <br/> <br />But still, this isn't the most elegant solution.
Max Schmeling
A: 

you can use RegEx or check if the trailing string is a break and remove it

Leon Tayson
This only accepts characters to trim.
Max Schmeling
yep... tested it and edited my reply.
Leon Tayson
+8  A: 

As @Mitch said,

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Thu, Sep 25, 2008, 02:01:36 PM
///  Using Expresso Version: 2.1.2150, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  Match expression but don't capture it. [\<br\s*/?\>], any number of repetitions
///      \<br\s*/?\>
///          <
///          br
///          Whitespace, any number of repetitions
///          /, zero or one repetitions
///          >
///  End of line or string
///  
///  
/// </summary>
public static Regex regex = new Regex(
    @"(?:\<br\s*/?\>)*$",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );
regex.Replace(text, string.Empty);
bdukes
Thanks for the implementation of my recommendation!
Mitchel Sellers
+1  A: 

You could also try (if the markup is likely to be a valid tree) something similar to:

        string s = "<markup><div>Text</div><br /><br /></markup>";

        XmlDocument doc = new XmlDocument();
        doc.LoadXml(s);

        Console.WriteLine(doc.InnerXml);

        XmlElement markup = doc["markup"];
        int childCount = markup.ChildNodes.Count;
        for (int i = childCount -1; i >= 0; i--)
        {
            if (markup.ChildNodes[i].Name.ToLower() == "br")
            {
                markup.RemoveChild(markup.ChildNodes[i]);
            }
            else
            {
                break;
            }
        }
        Console.WriteLine("---");
        Console.WriteLine(markup.InnerXml); 
        Console.ReadKey();

The code above is a bit "scratch-pad" but if you cut and paste it into a Console application and run it, it does work :=)

Rob
+3  A: 

I'm trying to ignore the ambiguity in your original question, and read it literally. Here is an extension method that overloads TrimEnd to take a string.

static class StringExtensions
{
    public static string TrimEnd(this string s, string remove)
    {
        if (s.EndsWith(remove))
        {
            return s.Substring(0, s.Length - remove.Length);
        }
        return s;
    }
}

Here are some tests to show that it works:

        Debug.Assert("abc".TrimEnd("<br>") == "abc");
        Debug.Assert("abc<br>".TrimEnd("<br>") == "abc");
        Debug.Assert("<br>abc".TrimEnd("<br>") == "<br>abc");

I want to point out that this solution is easier to read than regex, probably faster than regex (you should use a profiler, not speculation, if you're concerned about performance), and useful for removing other things from the ends of strings.

regex becomes more appropriate if your problem is more general than you stated (e.g., if you want to remove <BR> and </BR> and deal with trailing spaces or whatever.

Jay Bazuzi
+2  A: 

Small change to bdukes code, which should be faster as it doesn't backtrack.

public static Regex regex = new Regex(
    @"(?:\<br[^>]*\>)*$",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
);
regex.Replace(text, string.Empty);
Ray Hayes