views:

170

answers:

2

(First post, please be gentle!)

I am just learning about LINQ to XML in all its glory and frailty, trying to hack it to do what I want to do:

Given an XML file like this -

<list>
<!-- random data, keys, values, etc.-->

  <key>FIRST_WANTED_KEY</key>
  <value>FIRST_WANTED_VALUE</value>

  <key>SECOND_WANTED_KEY</key>
  <value>SECOND_WANTED_VALUE</value> <!-- wanted because it's first -->

  <key>SECOND_WANTED_KEY</key>
  <value>UNWANTED_VALUE</value>  <!-- not wanted because it's second -->

  <!-- nonexistent <key>THIRD_WANTED_KEY</key> -->
  <!-- nonexistent <value>THIRD_WANTED_VALUE</value> -->

<!-- more stuff-->
</list>

I want to extract the values of a set of known "wanted keys" in a robust fashion, i.e. if SECOND_WANTED_KEY appears twice, I only want SECOND_WANTED_VALUE, not UNWANTED_VALUE. Additionally, THIRD_WANTED_KEY may or may not appear, so the query should be able to handle that as well. I can assume that FIRST_WANTED_KEY will appear before other keys, but can't assume anything about the order of the other keys - if a key appears twice, its values aren't important, I only want the first one. An anonymous data type consisting of strings is fine.

My attempt has centered around something along these lines:

var z = from y in x.Descendants()
        where y.Value == "FIRST_WANTED_KEY"
        select new
        {
          first_wanted_value = ((XElement)y.NextNode).Value,
         //...
        }

My question is what should that ... be? I've tried, for instance, (ugly, I know)

second_wanted_value = ((XElement)y.ElementsAfterSelf()
                      .Where(w => w.Value=="SECOND_WANTED_KEY")
                      .FirstOrDefault().NextNode).Value

which should hopefully allow the key to be anywhere, or non-existent, but that hasn't worked out, since .NextNode on a null XElement doesn't seem to work.

I've also tried to add in a

.Select(t => { 
    if (t==null) 
        return new XElement("SECOND_WANTED_KEY",""); 
    else return t;
})

clause in after the where, but that hasn't worked either.

I'm open to suggestions, (constructive) criticism, links, references, or suggestions of phrases to Google for, etc. I've done a fair share of Googling and checking around S.O., so any help would be appreciated.

Thanks!

EDIT: Let me add a layer of complexity to this - I should have included this in the first place. Let's say the XML document looks like this:

<lists>
    <list>
      <!-- as above -->
    </list>
    <list>
      <!-- as above -->
    </list>
</lists>

and I want to extract multiple sets of these key-value pairs. Question/Caution: if SECOND_WANTED_KEY doesn't appear in the first <list> element but appears in the second, I don't want to accidentally pick up the second list element's SECOND_WANTED_KEY.

EDIT #2:

As another idea, I've tried creating a HashSet of the keys that I'm looking for and doing this:

HashSet<string> wantedKeys = new HashSet<string>();
wantedKeys.Add("FIRST_WANTED_KEY");
//...add more keys here
var kvp = from a in x.Descendants().Where(a => wantedKeys.Contains(a.Value))
          select new KeyValuePair<string,string>(a.value,
             ((XElement)a.NextNode).Value);

This gets me all of the key-value pairs, but I'm not sure if it guarantees that I'll properly "associate" the pairs to their parent `' element. Any thoughts or comparisons between these two approaches would be helpful.

Status Update 4/9/10

As of right now I'm still mostly thinking the hash set approach is the most preferred. It seems like most of the XML processing done by .NET is done in document order - so far all of my test cases have been working out.

I'd offer a bounty and/or upvote answers, but don't have enough rep points for that. I'll decide on an answer today, so get 'em in! Thanks.

A: 

This gets the value of the first <value> element after the first <key> element containing "SECOND_WANTED_KEY":

XDocument doc;

string result = (string)doc.Root
                           .Elements("key")
                           .First(node => (string)node == "SECOND_WANTED_KEY")
                           .ElementsAfterSelf("value")
                           .First();

Add null checks as desired.

dtb
Thanks for the response - makes sense - I can just add multiple statements like this for each key. Any thoughts on my addition/clarification above?
awshepard
I think this is closest to what I'm looking for, though if there are ever more suggestions from anyone, I'm all ears. Thanks.
awshepard
A: 
XDocument doc = ...

var wantedKeyValuePairs =
    from keyElement in doc.Root.Elements("key")
    let valueElement = keyElement.ElementsAfterSelf("value").First()
    select new { Key = keyElement.Value, Value = valueElement.Value } into kvp
    group kvp by kvp.Key into g
    select g.First();

Explanation : this query takes each <key> element and its following <value> element, and makes a key-value pair with these elements. It then groups the key-value pairs by key, and takes only the first key-value pair for each key

Thomas Levesque
Thanks - does this support "filtering" in the sense of possibly having some key-value pairs that I don't want? For instance, if <key>UNWANTED_KEY</key> <value>UNWANTED_VALUE</value>appeared, would this be able to not pick it up?
awshepard
You just need to add an extra `where` clause, just after the first `select` : `where kvp.Key != "UNWANTED_KEY"`
Thomas Levesque