views:

407

answers:

6

I'm working in Microsoft Visual C# 2008 Express.

Let's say I have a string and the contents of the string is: "This is my <myTag myTagAttrib="colorize">awesome</myTag> string."

I'm telling myself that I want to do something to the word "awesome" - possibly call a function that does something called "colorize".

What is the best way in C# to go about detecting that this tag exists and getting that attribute? I've worked a little with XElements and such in C#, but mostly to do with reading in and out XML files.

Thanks!

-Adeena

A: 

I'm a little confused about your example, because you switch between the string (text content), tags, and attributes. But I think what you want is XPath.

So if your XML stream looks like this:

<adeena/><parent><child x="this is my awesome string">This is another awesome string<child/><adeena/>

You'd use an XPath expression that looks like this to find the attribute:

//child/@x

and one like this to find the text value under the child tag:

//child

I'm a Java developer, so I don't know what XML libraries you'd use to do this. But you'll need a DOM parser to create a W3C Document class instance for you by reading in the XML file and then using XPath to pluck out the values.

There's a good XPath tutorial from the W3C schools if you need it.

UPDATE:

If you're saying that you already have an XML stream as String, then the answer is to not read it from a file but from the String itself. Java has abstractions called InputStream and Reader that handle streams of bytes and chars, respectively. The source can be a file, a string, etc. Check your C# DOM API to see if it has something similar. You'll pass the string to a parser that will give back a DOM object that you can manipulate.

duffymo
In my C# code, this: "This is my <myTag myTagAttrib="colorize">awesome</myTag> string." is really, truly saved as a string... that's my problem. how do I turn it into an XElement or Xpath or...?
adeena
My example XML isn't well-formed; sorry about that. I typed it into SO using escaped values, and didn't do a very good job of it.
duffymo
A: 

Since the input is not well-formed XML you won't be able to parse it with any of the built in XML libraries. You'd need a regular expression to extract the well-formed piece. You could probably use one of the more forgiving HTML parsers like HtmlAgilityPack on CodePlex.

Josh Einstein
The example in the comment looks well-formed to me. What did I miss?
duffymo
Ooops, it's mine that's off. Didn't check myself closely enough.
duffymo
A: 

The XmlTextReader can parse XML fragments with a special constructor which may help in this situation, but I'm not positive about that.

There's an in-depth article here:

http://geekswithblogs.net/kobush/archive/2006/04/20/75717.aspx

HVS
+1  A: 

You can extract the XML with a regular expression, load the extracted xml string in a XElement and go from there:

string text=@"This is my<myTag myTagAttrib='colorize'>awesome</myTag> text.";
Match match=Regex.Match(text,@"(<MyTag.*</MyTag>)");
string xml=match.Captures[0].Value;
XElement element=XElement.Parse(xml);
XAttribute attribute=element.Attribute("myTagAttrib");
if(attribute.Value=="colorize") DoSomethingWith(element.Value);// Value=awesome

This code will throw an exception if no MyTag element was found, but that can be remedied by inserting a line of:

if(match.Captures.Count!=0)
{...}

It gets even more interesting if the string could hold more than just the MyTag Tag...

Dabblernl
I cheated, I removed the "This is my" part of your string to make the XML well formed. Hope that it still helps you
Dabblernl
well, I could put my string in a temporary tag like this, right? "<tempTag>This is my <myTag myTagAttrib="colorize">awesome</myTag> string.</tempTag>" then I have well-formed XML and I think I'm good to go from there...
adeena
and related... how do I know (because I don't when I'm setting up my string and my element) if the tag "myTag" exists in the element at all? It might not, or there could be "myTag2". (I have several possible tags at that level
adeena
That's why I suggested looking at HtmlAgilityPack. It's really not XML that you're working with here. It's loose markup.
Josh Einstein
@Adeena, you need a regular expression to find and extract XML tags as suggested by others. I edited the code.
Dabblernl
+1  A: 

Another solution:

var myString = "This is my <myTag myTagAttrib='colorize'>awesome</myTag> string.";
try
{
 var document = XDocument.Parse("<root>" + myString + "</root>");
 var matches = ((System.Collections.IEnumerable)document.XPathEvaluate("myTag|myTag2")).Cast<XElement>();
 foreach (var element in matches)
 {
  switch (element.Name.ToString())
  {
   case "myTag":
    //do something with myTag like lookup attribute values and call other methods
    break;
   case "myTag2":
    //do something else with myTag2
    break;
  }
 }
}
catch (Exception e)
{
 //string was not not well formed xml
}

I also took into account your comment to Dabblernl where you want parse multiple attributes on multiple elements.

Martijn Laarman
This looks closer to what I want to do. But instead of working with an "XDocument", can I do this with an Xelement? Basically, I'm able to set my string as an XElement with this: "XElement element = XElement.Parse( @myXMLstring);" where I do have <root></root> in the string. Next, I can test to see if it *has* child elements with element.HasElements. If it does have child elements, how do I return them? all the children are optional... do I have to do a test to see if each one is there?
adeena
sure i updated the example to the best of my understanding of what it is that you want to do. It basically selects the elements you specify (seperated by a |) into matches. Then loop over the found elements (if any) and process them differently. You dont have to test if each one is there though, if they're not there they're simply not selected into matches.
Martijn Laarman
cool. This does look exactly like what I want to do. One problem with "IEnumerable"... I'm getting the following error: "Usnig the generic type 'System.Collections.Generic.IEnumerable<T> requires '1' type arguments" ??
adeena
Yeah it uses IEnumerable from System.Collections instead of System.Collections.Generic i fully qualified it now, should work properly now alongside IEnumerable<T>.
Martijn Laarman
awesome. perfect. thanks bunches! :)
adeena
Great, If the string holds nested elements just prefix the tagnames with // this will find them at any depth in the xml document. XPath is really powerful :)
Martijn Laarman
A: 

Hi, This is my solution to match any type of xml using Regex: C# Better way to detect XML?

Rashmi Pandit