tags:

views:

22

answers:

3

So I have an xml file with the following simplified xml file contents:

<CollectionItems>
    <CollectionItem>
        <Element1>Value1</Element1>
        <Element2>
            <SubElement1>SubValue1</SubElement1>
            <SubElement2>SubValue2</SubElement2>
            <SubElement3>SubValue3</SubElement3>
        </Element2>
        <Element3>Value3</Element3>
    </CollectionItem>
    <CollectionItem>
        <Element1>Value1</Element1>
        <Element2>
            <SubElement1>SubValue1</SubElement1>
            <SubElement2 />
            <SubElement3>SubValue3</SubElement3>
        </Element2>
        <Element3>Value3</Element3>
    </CollectionItem>
    <CollectionItem>
        <Element1>Value1</Element1>
        <Element2>
            <SubElement1>SubValue1</SubElement1>
            <SubElement2>SubValue2</SubElement2>
            <SubElement3>SubValue3</SubElement3>
        </Element2>
        <Element3>Value3</Element3>
    </CollectionItem>
</CollectionItems>

I am attempting to write a regex in .Net which matches any CollectionItem where SubElement2 is empty (the middle CollectionItem in this example).

I have the following regex so far (SingleLine mode enabled):

<CollectionItem>.+?<SubElement2 />.+?</CollectionItem>

The problem is that it is matching the opening of the first CollectionItem through the close of the second CollectionItem. I understand why it's doing this, but I don't know how to modify the regex to make it match only the center CollectionItem.

Edit: As to why regex as opposed to something else:

  1. I was attempting to modify the file in a text editor for simplicity.
  2. After I couldn't figure out how to do it in regex, I wanted to know if it could be done (and how) for the sake of learning.

Thanks!

+5  A: 

Why are you trying to use a regular expression? You've got a perfectly good domain model (XML) - why not search that instead? So for example in LINQ to XML:

var collectionsWithEmptySubElement2 =
       document.Descendants("SubElement2")
               .Where(x => x.IsEmpty)
               .Select(x => x.Ancestors("CollectionItem").FirstOrDefault());

or

var collectionsWithEmptySubElement2 =
       document.Descendants("CollectionItem")
               .Where(x => x.Descendants("SubElement2").Any(sub => sub.IsEmpty));
Jon Skeet
I had considered using LINQPad to accomplish this (I'm trying to fix an xml data file with some invalid values), but then I just became curious as to how you would actually do it in RegEx if you wanted to.
Dan Rigby
+3  A: 

This is XML - why are you trying to do this with Regex? Wouldn't XPath make more sense?

David M
`/CollectionItems/CollectionItem[./*/SubElement2='']`
Greg
+2  A: 

You could use

<CollectionItem>((?!<CollectionItem>).)+?<SubElement2 />.+?</CollectionItem>

This ensures that no further <CollectionItem> comes between the starting tag and the <SubElement2 /> tag.

Tim Pietzcker
Thats works! Thank you.
Dan Rigby