views:

607

answers:

2

One Possible (working) Solution:

Private Sub ReadXMLAttributes(ByVal oXML As String)
    ReadXMLAttributes(oXML, "mso-infoPathSolution")
End Sub
Private Sub ReadXMLAttributes(ByVal oXML As String, ByVal oTagName As String)
    Try
        Dim XmlDoc As New Xml.XmlDocument
        XmlDoc.LoadXml(oXML)
        oFileInfo = New InfoPathDocument
        Dim XmlNodes As Xml.XmlNodeList = XmlDoc.GetElementsByTagName(oTagName)
        For Each xNode As Xml.XmlNode In XmlNodes
            With xNode
                oFileInfo.SolutionVersion = .Attributes(InfoPathSolution.solutionVersion).Value
                oFileInfo.ProductVersion = .Attributes(InfoPathSolution.productVersion).Value
                oFileInfo.PIVersion = .Attributes(InfoPathSolution.PIVersion).Value
                oFileInfo.href = .Attributes(InfoPathSolution.href).Value
                oFileInfo.name = .Attributes(InfoPathSolution.name).Value
            End With
        Next
    Catch ex As Exception
        MsgBox(ex.Message, MsgBoxStyle.OkOnly, "ReadXMLAttributes")
    End Try
End Sub

This works, but it will still suffer from the problem below if the attributes are reordered. The only way I can think of to avoid this problem is to hard-code the attribute names into my program, and have it process the entry by looping through the parsed tag and searching for the designated tags.

NOTE: InfoPathDocument is a custom class I made, it is nothing complicated:

Public Class InfoPathDocument
    Private _sVersion As String
    Private _pVersion As String
    Private _piVersion As String
    Private _href As String
    Private _name As String
    Public Property SolutionVersion() As String
        Get
         Return _sVersion
        End Get
        Set(ByVal value As String)
         _sVersion = value
        End Set
    End Property
    Public Property ProductVersion() As String
        Get
         Return _pVersion
        End Get
        Set(ByVal value As String)
         _pVersion = value
        End Set
    End Property
    Public Property PIVersion() As String
        Get
         Return _piVersion
        End Get
        Set(ByVal value As String)
         _piVersion = value
        End Set
    End Property
    Public Property href() As String
        Get
         Return _href
        End Get
        Set(ByVal value As String)
         If value.ToLower.StartsWith("file:///") Then
          value = value.Substring(8)
         End If
         _href = Form1.PathToUNC(URLDecode(value))
        End Set
    End Property
    Public Property name() As String
        Get
         Return _name
        End Get
        Set(ByVal value As String)
         _name = value
        End Set
    End Property
    Sub New()

    End Sub
    Sub New(ByVal oSolutionVersion As String, ByVal oProductVersion As String, ByVal oPIVersion As String, ByVal oHref As String, ByVal oName As String)
        SolutionVersion = oSolutionVersion
        ProductVersion = oProductVersion
        PIVersion = oPIVersion
        href = oHref
        name = oName
    End Sub
    Public Function URLDecode(ByVal StringToDecode As String) As String
        Dim TempAns As String = String.Empty
        Dim CurChr As Integer = 1
        Dim oRet As String = String.Empty
        Try
         Do Until CurChr - 1 = Len(StringToDecode)
          Select Case Mid(StringToDecode, CurChr, 1)
           Case "+"
            oRet &= " "
           Case "%"
            oRet &= Chr(Val("&h" & Mid(StringToDecode, CurChr + 1, 2)))
            CurChr = CurChr + 2
           Case Else
            oRet &= Mid(StringToDecode, CurChr, 1)
          End Select
          CurChr += 1
         Loop
        Catch ex As Exception
         MsgBox(ex.Message, MsgBoxStyle.OkOnly, "URLDecode")
        End Try
        Return oRet
    End Function
End Class

Original Question

I am working on a project that requires the reading of an XML document, particularly a saved form from Microsoft InfoPath.

Here is a simple example of what I will be working with along with some background information that might be helpful:

<?xml version="1.0" encoding="UTF-8"?>
<?mso-infoPathSolution solutionVersion="1.0.0.2" productVersion="12.0.0" PIVersion="1.0.0.0" href="file:///C:\Users\darren\Desktop\simple_form.xsn" name="urn:schemas-microsoft-com:office:infopath:simple-form:-myXSD-2009-05-15T14-16-37" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<my:myFields xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-05-15T14:16:37" xml:lang="en-us">
    <my:first_name>John</my:first_name>
    <my:last_name>Doe</my:last_name>
</my:myFields>

My goal right now is to extract the versionID and location of the form. Easy enough with regex:

Dim _doc As New XmlDocument
_doc.Load(_thefile)
Dim oRegex As String = "^solutionVersion=""(?<sVersion>[0-9.]*)"" productVersion=""(?<pVersion>[0-9.]*)"" PIVersion=""(?<piVersion>[0-9.]*)"" href=""(?<href>.*)"" name=""(?<name>.*)""$"
Dim rx As New Regex(oRegex), m As Match = Nothing
For Each section As XmlNode In _doc.ChildNodes
 m = rx.Match(section.InnerText.Trim)
 If m.Success Then
  Dim temp As String = m.Groups("name").Value.Substring(m.Groups("name").Value.ToLower.IndexOf("infopath") + ("infopath").Length + 1)
  fileName = temp.Substring(0, temp.LastIndexOf(":"))
  fileVersion = m.Groups("sVersion").Value
 End If
Next

The only problem that this working solution brings up is if the schema changes in the InfoPath document header...for instance the solution version and product version properties swap locations (microsoft LOVES doing things like this, it seems).

So I have opted to try to use the XML parsing ability of VB.NET to help me achieve the above results, sans-regex.

The ChildNode from the _doc object that contains the information I need, however it does not have any ChildNodes:

_doc.ChildNode(1).HasChildNodes = False

Can anyone help me out with this? Thanks in advanced!

A: 

Problem is that the tags you want to parse are not really part of the XML-Document. They are the XML-Prolog containing the processing instructions. And so they won't be available in the XmlDocument as elements.

My only idea would be (apart from looking through the documentation how one could access these elements) to move only the mso-infoPathSolution-element into a XmlDocument of its own, after stripping the <? ?> away and replacing them with < />. Then you could access the attributes regardless of their ordering.

Leonidas
Any ideas how to insert this particular node into a new XmlDocument? I am relatively new to Xml parsing and manipulation. Currently I am trying to modify the newNode's OuterXml, but it is ReadOnly so the quest continues!
Anders
+1  A: 

The processing instructions are part of the XML document, but their attributes don't get parsed. Try this code:

// Load the original xml...
var xml = new XmlDocument();
xml.Load( _thefile );

// Select out the processing instruction...
var infopathProcessingInstruction = xml.SelectSingleNode( "/processing-instruction()[local-name(.) = \"mso-infoPathSolution\"]" );

// Since the processing instruction does not expose it's attributes, create a new XML document...
var xmlInfoPath = new XmlDocument();
xmlInfoPath.LoadXml("<data " + infopathProcessingInstruction.InnerText + " />");

// Get the data...
var solutionVersion = xmlInfoPath.DocumentElement.GetAttribute("solutionVersion");
var productVersion  = xmlInfoPath.DocumentElement.GetAttribute("productVersion");
David
Awesome, thank you for this!
Anders
One thing though, could you explain the line that sets the infopathProcessingInstruction variable? Like I mentioned in the question I am new to XML manipulation, so I don't exactly see how that line accomplishes what it does :P
Anders
The XPath on the "var infopathProcessingInstruction" line says "give me the first processing instruction that has a name of "mso-infoPathSolution". This url might help: http://aspalliance.com/515
David