Is the data proper xml, or does it just look like it?
If it is html, then the HTML Agility Pack is worth investigation - this provides a DOM (similar to XmlDocument) that you can use to query the data:
string input = @"<html>...some html content <b> etc </b> ...
<user> hello <b>mitch</b> </user>
...some html content <b> etc </b> ...
<message> some html <i>message</i> <a href....>bla</a> </message>
...some html content <b> etc </b> ...</html>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(input);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//user | //message"))
{
Console.WriteLine("{0}: {1}", node.Name, node.InnerText);
// or node.InnerHtml to keep the formatting within the content
}
This outputs:
user: hello mitch
message: some html message bla
If you want the formatting tags, then use .InnerHtml instead of .InnerText.
If it is xml, then to code with the full spectrum of xml, it would be better to use an xml parser. For small-to-mid size xml, loading it into a DOM such as XmlDocument would be fine - then query the nodes (for example, "//*"). For huge xml, XmlReader might be an option.
If the data doesn't have to worry about the full xml, then some simple regex shouldn't be too tricky... a simplified example (no attributes, no namespaces, no nested xml) might be:
string input = @"blah <tag1> content for tag 1 </tag1> blop
<tag2> content for tag 2 </tag2> bloop
<tag3> content for tag 3 </tag3> blip";
const string pattern = @"<(\w+)>\s*([^<>]*)\s*</(\1)>";
Console.WriteLine(Regex.IsMatch(input, pattern));
foreach(Match match in Regex.Matches(input, pattern)) {
Console.WriteLine("{0}: {1}", match.Groups[1], match.Groups[2]);
}