tags:

views:

544

answers:

4

I have an xml document like the following:

<menuitem navigateurl="/PressCentre/" text="&#1087;&#1088;&#1077;&#1089; &#1094;&#1077;&#1085;&#1090;&#1098;&#1088;">
    <menuitem navigateurl="/PressCentre/RegisterForPressAlerts/" text="&#1088;&#1077;&#1075;&#1080;&#1089;&#1090;&#1098;&#1088; &#1079;&#1072; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    <menuitem navigateurl="/PressCentre/PressReleases/" text="&#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;">
        <menuitem navigateurl="/PressCentre/PressReleases/PressReleasesArchive/" text="&#1072;&#1088;&#1093;&#1080;&#1074; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressKit/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1084;&#1087;&#1083;&#1077;&#1082;&#1090;">
        <menuitem navigateurl="/PressCentre/PressKit/FactSheets/" text="&#1089;&#1087;&#1080;&#1089;&#1098;&#1082; &#1092;&#1072;&#1082;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/ExpertComments/" text="&#1082;&#1086;&#1084;&#1077;&#1085;&#1090;&#1072;&#1088;&#1080; &#1085;&#1072; &#1077;&#1082;&#1089;&#1087;&#1077;&#1088;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/Testimonials/" text="&#1087;&#1088;&#1077;&#1087;&#1086;&#1088;&#1098;&#1082;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/MediaFiles/" text="&#1084;&#1077;&#1076;&#1080;&#1103; &#1092;&#1072;&#1081;&#1083;&#1086;&#1074;&#1077;" />
        <menuitem navigateurl="/PressCentre/PressKit/Photography/" text="&#1089;&#1085;&#1080;&#1084;&#1082;&#1080;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressContacts/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1085;&#1090;&#1072;&#1082;&#1090;&#1080;" />
</menuitem>

I need to get the value between navigateurl (e.g. "/PressCentre"). Is there a well known regex script to do this?

Thanks

+4  A: 

A basic recursion (not tested but I think it's ok):

private void Caller(String filepath)
{
    XPathDocument oDoc = new XPathDocument(filepath);
    Readnodes( oDoc.CreateNavigator() );
}

private void ReadNodes(XPathNavigator nav)
{
    XPathNodeIterator nodes = nav.Select("menuitem");
    while (nodes.MoveNext())
    {
     //A - read the attribute
     string url = nodes.Current.GetAttribute("navigateurl", string.Empty);

        //B - do something with the data

     //C - recurse
     ReadNodes(nodes.Current);
    }
}

...works because an XPathNodeIterator's Current property is also an XPathNavigator. Obviously you'd need to extend this to push data to a dictionary or keep track of depth or whatever.

annakata
Doh! You beat me to it, and with an example, too.
ZombieSheep
heh - but now I'm utterly baffled as to what the question *is* anymore :)
annakata
Thanks! I'll give that a try. There's some new classes for me to learn :)
dotnetdev
Also of note is the fact that you can create XPath expressions that handle the recursion, so you don't need to make the function recurse.For a quick example: http://msdn.microsoft.com/en-us/library/ms256086.aspx
ee
i.e. nav.Select("//menuitem") should get all menu items recursively
ee
Yeah, I really should have mentioned that - really good call, and likely an even simpler, better answer in practice - but the reason I've provided this as is is to demonstrate recursion. Also worth noting that "//menuitem" won't give you the option of additional logic i.e. around depth checks.
annakata
+1  A: 

Why use Regex for this when XPath is (to me, at least) the natural choice? That's basically what XSLT should implement...

ZombieSheep
Simply because regex I don't have much experience of.
dotnetdev
A: 

Any particular reason you're using a regex? Have you tried using XPath for this? Here are some examples of how to use XPath. http://www.w3schools.com/XPath/xpath_examples.asp

Conrad
A: 

Use xpath, //menuitem[@navigateurl]/@navigateurl .

This xpath will grab all the menu items which have an attribute naviagate url and return a node-list (xpath 1.0) or sequence (xpath 2.0) of navigateurl values. By having the navigateurl attribute predicate, that ensures that only the leaf menu items are fetched.

JavaRocky
Your xpath is wrong for leaves actually - "//menuitem[not(menuitem)]" would capture all the leaves only (not that this is what the OP requests). I still like this as a solution generally, but the OP's position on recursion has not been clarified.
annakata