(First post, please be gentle!)
I am just learning about LINQ to XML in all its glory and frailty, trying to hack it to do what I want to do:
Given an XML file like this -
<list>
<!-- random data, keys, values, etc.-->
<key>FIRST_WANTED_KEY</key>
<value>FIRST_WANTED_VALUE</value>
<key>SECOND_WANTED_KEY</key>
<value>SECOND_WANTED_VALUE</value> <!-- wanted because it's first -->
<key>SECOND_WANTED_KEY</key>
<value>UNWANTED_VALUE</value> <!-- not wanted because it's second -->
<!-- nonexistent <key>THIRD_WANTED_KEY</key> -->
<!-- nonexistent <value>THIRD_WANTED_VALUE</value> -->
<!-- more stuff-->
</list>
I want to extract the values of a set of known "wanted keys" in a robust fashion, i.e. if SECOND_WANTED_KEY
appears twice, I only want SECOND_WANTED_VALUE
, not UNWANTED_VALUE
. Additionally, THIRD_WANTED_KEY
may or may not appear, so the query should be able to handle that as well. I can assume that FIRST_WANTED_KEY
will appear before other keys, but can't assume anything about the order of the other keys - if a key appears twice, its values aren't important, I only want the first one. An anonymous data type consisting of strings is fine.
My attempt has centered around something along these lines:
var z = from y in x.Descendants()
where y.Value == "FIRST_WANTED_KEY"
select new
{
first_wanted_value = ((XElement)y.NextNode).Value,
//...
}
My question is what should that ...
be? I've tried, for instance, (ugly, I know)
second_wanted_value = ((XElement)y.ElementsAfterSelf()
.Where(w => w.Value=="SECOND_WANTED_KEY")
.FirstOrDefault().NextNode).Value
which should hopefully allow the key to be anywhere, or non-existent, but that hasn't worked out, since .NextNode
on a null XElement
doesn't seem to work.
I've also tried to add in a
.Select(t => {
if (t==null)
return new XElement("SECOND_WANTED_KEY","");
else return t;
})
clause in after the where, but that hasn't worked either.
I'm open to suggestions, (constructive) criticism, links, references, or suggestions of phrases to Google for, etc. I've done a fair share of Googling and checking around S.O., so any help would be appreciated.
Thanks!
EDIT: Let me add a layer of complexity to this - I should have included this in the first place. Let's say the XML document looks like this:
<lists>
<list>
<!-- as above -->
</list>
<list>
<!-- as above -->
</list>
</lists>
and I want to extract multiple sets of these key-value pairs. Question/Caution: if SECOND_WANTED_KEY
doesn't appear in the first <list>
element but appears in the second, I don't want to accidentally pick up the second list element's SECOND_WANTED_KEY
.
EDIT #2:
As another idea, I've tried creating a HashSet
of the keys that I'm looking for and doing this:
HashSet<string> wantedKeys = new HashSet<string>();
wantedKeys.Add("FIRST_WANTED_KEY");
//...add more keys here
var kvp = from a in x.Descendants().Where(a => wantedKeys.Contains(a.Value))
select new KeyValuePair<string,string>(a.value,
((XElement)a.NextNode).Value);
This gets me all of the key-value pairs, but I'm not sure if it guarantees that I'll properly "associate" the pairs to their parent `' element. Any thoughts or comparisons between these two approaches would be helpful.
Status Update 4/9/10
As of right now I'm still mostly thinking the hash set approach is the most preferred. It seems like most of the XML processing done by .NET is done in document order - so far all of my test cases have been working out.
I'd offer a bounty and/or upvote answers, but don't have enough rep points for that. I'll decide on an answer today, so get 'em in! Thanks.