views:

24

answers:

1

I'm using libxml2 to parse HTML. The HTML might look like this:

<div>
    Some very very long text here.
</div>

I want to insert a child node, e.g. a header, in before the text, like this:

<div>
    <h3>
        Some header here
    </h3>
    Some very very long text here.
</div>

Unfortunately, libxml2 always adds my header after the text, like this:

<div>
    Some very very long text here.
    <h3>
        Some header here
    </h3>
</div>

How can I solve this problem?

+1  A: 

The text content is a child node, so you can get a pointer to the text node and use the xmlAddPrevSibling function to add the element. Here is an example, but without error handling or proper cleanup.

xmlInitParser();

// Create an XML document
std::string content( "<html><head/><body><div>Some long text here</div></body></html>" );
xmlDocPtr doc = xmlReadMemory( content.c_str(), content.size(), "noname.xml", 0, 0 );

// Query the XML document with XPATH, we could use the XPATH text() function 
// to get the text node directly but for the sake of the example we'll get the
// parent 'div' node and iterate its child nodes instead.
std::string xpathExpr( "/html/body/div" );
xmlXPathContextPtr xpathCtx = xmlXPathNewContext( doc );
xmlXPathObjectPtr xpathObj = xmlXPathEvalExpression( BAD_CAST xpathExpr.c_str(), xpathCtx );

// Get the div node
xmlNodeSetPtr nodes = xpathObj->nodesetval;
xmlNodePtr divNode = nodes->nodeTab[ 0 ];

// Iterate the div child nodes, though in this example we know
// there'll only be one node, the text node.
xmlNodePtr divChildNode = divNode->xmlChildrenNode;
while( divChildNode != 0 )
    {
    if( xmlNodeIsText( divChildNode ) )
        {
        // Create a new element with text node
        xmlNodePtr headingNode = xmlNewNode( 0, BAD_CAST "h3" );
        xmlNodePtr headingChildNode = xmlNewText( BAD_CAST "Some heading here" );
        xmlAddChild( headingNode, headingChildNode );

        // Add the new element to the existing tree before the text content
        xmlAddPrevSibling( divChildNode, headingNode );
        break;
        }
    divChildNode = divChildNode->next;
    }

// Display the result
xmlDocDump( stdout, doc );

xmlCleanupParser();
Ishmael