I set out today with the intent to parse an SSRS RDL file (XML) using Python in order to gather the DataSet and Query data. A recent project has me back tracking on a variety of reports and data sources with the intention of consolidating and cleaning up what we have published.
I was able to use this script to create a CSV file with the following columns: system path|report file name|command type|command text|
It's not very elegant, but it works.
What I'm hoping to be able to do with this post is solicit for any of you experts out there who have either tried this already or are experienced in XML parsing with Python to take a shot at cleaning it up and provided the ability to:
- Include Headers, which would be XML tags
- Include DataSet name in the column
- Deliver results into single file
Here is the full code in my "rdlparser.py" file:
import sys, os
from xml.dom import minidom
xmldoc = minidom.parse(sys.argv[1])
content = ""
TargetFile = sys.argv[1].split(".", 1)[0] + ".csv"
numberOfQueryNodes = 0
queryNodes = xmldoc.getElementsByTagName('Query')
numberOfQueryNodes = queryNodes.length -1
while (numberOfQueryNodes > -1):
content = content + os.path.abspath(sys.argv[1])+ '|'+ sys.argv[1].split(".", 1)[0]+ '|'
outputNode = queryNodes.__getitem__(numberOfQueryNodes)
children = [child for child in outputNode.childNodes if child.nodeType==1]
numberOfQueryNodes = numberOfQueryNodes - 1
for node in children:
if node.firstChild.nodeValue != '\n ':
if node.firstChild.nodeValue != 'true':
content = content + node.firstChild.nodeValue + '|'
content = content + '\n'
fp = open(TargetFile, 'wb')
fp.write(content)
fp.close()