views:

26

answers:

1

I set out today with the intent to parse an SSRS RDL file (XML) using Python in order to gather the DataSet and Query data. A recent project has me back tracking on a variety of reports and data sources with the intention of consolidating and cleaning up what we have published.

I was able to use this script to create a CSV file with the following columns: system path|report file name|command type|command text|

It's not very elegant, but it works.

What I'm hoping to be able to do with this post is solicit for any of you experts out there who have either tried this already or are experienced in XML parsing with Python to take a shot at cleaning it up and provided the ability to:

  • Include Headers, which would be XML tags
  • Include DataSet name in the column
  • Deliver results into single file

Here is the full code in my "rdlparser.py" file:

import sys, os

from xml.dom import minidom
xmldoc = minidom.parse(sys.argv[1])

content = ""
TargetFile = sys.argv[1].split(".", 1)[0] + ".csv"
numberOfQueryNodes = 0

queryNodes = xmldoc.getElementsByTagName('Query')
numberOfQueryNodes = queryNodes.length -1


while (numberOfQueryNodes > -1):
    content = content + os.path.abspath(sys.argv[1])+ '|'+ sys.argv[1].split(".", 1)[0]+ '|' 
    outputNode = queryNodes.__getitem__(numberOfQueryNodes)
    children = [child for child in outputNode.childNodes if child.nodeType==1]
    numberOfQueryNodes = numberOfQueryNodes - 1
    for node in children:
        if node.firstChild.nodeValue != '\n          ':
            if node.firstChild.nodeValue != 'true':
                content = content + node.firstChild.nodeValue + '|'
    content = content + '\n'

fp = open(TargetFile, 'wb')
fp.write(content)
fp.close()
A: 

I know you asked for Python; but I figured Powershell's built in xml handling capabilities would make this fairly simple. While I'm sure it is not guru level, I think it came out pretty nicely (the lines starting with # are comments):

# The directory to search 
$searchpath = "C:\"

# List all rdl files    from the given search path recusrivley searching sub folders, store results into a variable
$files = gci $searchpath -recurse -filter "*.rdl" | SELECT FullName, DirectoryName, Name 

# for each of the found files pass the folder and file name  and the xml content
$files | % {$Directory = $_.DirectoryName; $Name = $_.Name; [xml](gc $_.FullName)}
            # in the xml content navigate to the the DataSets Element
            | % {$_.Report.DataSets} 
                    # for each query retrieve the Report directory , File Name, DataSource Name, Command Type, Command Text output thwese to a csv file
                    | % {$_.DataSet.Query} | SELECT  @{N="Path";E={$Directory}}, @{N="File";E={$Name}}, DataSourceName, CommandType, CommandText | Export-Csv Test.csv -notype   
JasonHorner
Worked beautifully with a little change to get it into a .ps1 file. I dumped it right in and tried to run it and powershell didn't like the "empty pipe element", so I put the last $files command on the same line.I have not had much exposure with powershell, but this will be just the thing to get me into it. Thanks for the help JasonHorner!
Vinnie