views:

88

answers:

3

I've got some simple XML-handling code that's supposed to locate a child node of a passed in node based on an attribute value:

function GetNodeByAttributeValue(
  const AParentNode: IXMLNode;
  const AttributeName: string; AttributeValue: Variant): IXMLNode;
var
  i: integer;
  value: Variant;
begin
  result := nil;
  if (not Assigned(AParentNode)) or (AttributeName = '') then
    exit;
  for i := 0 to AParentNode.ChildrenCount-1 do
  begin
    result := AParentNode.Children[i];
    value := result.GetAttributeValue(AttributeName, UnAssigned);
    if not VarIsEmpty(value) then
      exit;
  end;
  result := nil;
end;

Pretty straightforward, right? But when I try to run this, under certain circumstances it crashes with an Access Violation. Here's what's going on:

The IXML* implementation is provided by the RemObjects SDK Library. result.GetAttributeValue calls uROMSXMLImpl.TROMSXMLNode.GetAttributeValue, which calls TROMSXMLNode.GetAttributeByName, which says

  node := fNode.attributes.getNamedItem(anAttributeName);

And this crashes because fNode.attributes returns nil. As I understand it, that shouldn't ever happen.

The strange thing is, going back to the for loop in the original function, AParentNode.ChildrenCount returns 3. But the node in the original XML document only has one child node. It matches the criteria I'm looking for.

<ParentNode>
  <namespace:ChildNode name="right-name">

But AParentNode.ChildrenCount returns 3. I open them in the debugger and get this:

AParentNode.Children[0].name: '#text'
AParentNode.Children[1].name: 'namespace:ChildNode'
AParentNode.Children[2].name: '#text'

What in the world are these "#text" nodes? They're not in the XML document and I didn't write any code to insert them. Why are they there, and why are they buggy, and is there anything I can do to keep them from screwing up my attribute search?

+1  A: 

The #text nodes are the bits of whitespace before and after <namespace:ChildNode>. Since #text nodes are just bits of text, they have no attributes. If you want to get rid of those nodes, try using xsl:strip-space in an XSL transform, or just check whether the node is comprised entirely of whitespace.

Michael Williamson
+6  A: 

The text nodes are the whitespace being returned by the parser.
i.e. the indentation before <namespace:ChildNode name="right-name">

These whitespace elements are seen as children of <ParentNode>

crowne
+2  A: 

You have two choices. You can set an option in parser to strip whitespace (disable option to preserve whitespace) - or better you can check if node you're examining for attributes is actually an element, because only elements can have attributes. This is better also because if XML have processing instruction like this: <?some wired stuff?>, then even striping whitespaces doesn't help, because looking for attributes in processing instruction also gives AV in this parser. So I added to your code condition for NodeType here:

function GetNodeByAttributeValue(
  const AParentNode: IXMLNode;
  const AttributeName: string; AttributeValue: Variant): IXMLNode;
var
  i: integer;
  value: Variant;
begin
  result := nil;
  if (not Assigned(AParentNode)) or (AttributeName = '') then
    exit;
  for i := 0 to AParentNode.ChildrenCount-1 do
  begin
    result := AParentNode.Children[i];
    if result.NodeType = ntElement then
    begin
      value := Result.GetAttributeValue(AttributeName, UnAssigned);
      if not VarIsEmpty(value) and (value = AttributeValue) then
        exit;
    end;
  end;
  result := nil;
end;

Filtering you're doing can also be done easily in XSLT and/or XPath but I don't know if this parser supports XPath and don't know if XSLT would be actually handy for you.