tags:

views:

312

answers:

2

Somehow I can't read an XML file with very simple structure in my PowerShell script. The XML file is generated by "Microsoft Log Parser Toolkit" and I want to load its contents into a database.

The script is very simple:

$datalist=[xml] (gc ".\users.xml");
foreach ($ROW in $datalist.ROOT.ROW) {
        Write-host $ROW.CompName;
}

The expected output is

User1
User2

but nothing gets printed out. I have seen similar examples working (for instance here where they use PowerShell to parse outputs of stsadm -o enumsites.

What am I missing here?

Contents of the users.xml is simple, too:

<?xml version="1.0" encoding="ISO-10646-UCS-2" standalone="yes" ?>
<!DOCTYPE ROOT[
 <!ATTLIST ROOT DATE_CREATED CDATA #REQUIRED>
 <!ATTLIST ROOT CREATED_BY CDATA #REQUIRED>
 <!ELEMENT CompName (#PCDATA)>
 <!ELEMENT SoftwareName (#PCDATA)>
 <!ELEMENT ROW (CompName, SoftwareName)>
 <!ELEMENT ROOT (ROW*)>
]>
<ROOT DATE_CREATED="2009-12-30 10:44:23" CREATED_BY="Microsoft Log Parser V2.2">
 <ROW>
  <CompName>User1</CompName>
  <SoftwareName>Adobe Reader 9.0</SoftwareName>
 </ROW>
 <ROW>
  <CompName>User2</CompName>
  <SoftwareName>CorelDraw Graphics Suite X4</SoftwareName>
 </ROW>
</ROOT>
+3  A: 

This should solve it:

Select-Xml -Xml $datalist -XPath '//ROW' | % { $_.Node.CompName }

Edit: according to MediaAndMicrocode Where-Object should be used to filter out invalid values, but in this case it works without it.


Where lies the problem?

What is interesting:

poshdev >[62]: $x.ROOT | select -exp row
Select-Object : Property "row" cannot be found.
At line:1 char:17
+ $x.ROOT | select <<<<  -exp row
    + CategoryInfo          : InvalidArgument: (:PSObject) [Select-Object], PSArgumentException
    + FullyQualifiedErrorId : ExpandPropertyNotFound,Microsoft.PowerShell.Commands.SelectObjectCommand


CompName                                                                 SoftwareName
--------                                                                 ------------
User1                                                                    Adobe Reader 9.0
User2                                                                    Corel Photoshop 12

So if property can not be found, let's check its members:

$datalist.ROOT | gm

it shows that is a collection of string and XmlElement. Based on the knowledge this works as expected:

$datalist.root[1].row

It means, that cast to [xml] created 2 items: string (probably from the DTD part) and xml. Other workaround is simply remove the DTD part from the xml.

stej
Thank you for the answer! Indeed, if I remove the DTD part, it works as expected. I'll just ask people who generate the XML files to remove the data type definition.
naivists
I'm glad I could help :)
stej
Great explanation.
Richard Berg
+1  A: 

You don't have to remove the DTD part with Select-Xml, you can simply pipe it into a Where-Object. When you ask for all nodes with Select-Xml (Select-Xml //*), Select-Xml will automatically collect all of the namespaces:

Select-Xml -Xml $datalist -XPath '//*' |
    Where-Object { $_.Node.CompName } |
    Foreach-Object { $_.Node.CompName } 

You should also know that piping a file into Select-Xml is considerably faster that using Get-Content to read it. Get-Content reads line by line, and this can get very slow on large files. dir | Select-Xml will avoid reading the files line by line.

Hope this helps

Start-Automating
thank you, I didn't know the part about speed differences. In my case it wouldn't matter since the import will happen once a day and the file contains just about 1000 entities. Anyway, good to know.
naivists
What does it mean that Select-Xml automatically collects the namespaces?
stej