I've encountered (what I think is) a strange behavior when using the sax parser, and I wanted to know if it's normal.
I'm sending this XML through the SAX parser:
The "& a m p ;" gets converted to " & # 3 8 ;" when the startElement callback is called. Is it supposed to do that? If so, I would like to understand why.
I've pasted an example demonstrating the issue here:
#include <stdlib.h>
#include <libxml/parser.h>
static void start_element(void * ctx, const xmlChar *name, const xmlChar **atts)
{
int i = 0;
while(atts[i] != NULL) {
printf("%s\n", atts[i]);
i++;
}
}
int main(int argc, char *argv[]) {
xmlSAXHandlerPtr handler = calloc(1, sizeof(xmlSAXHandler));
handler->startElement = start_element;
char * xml = "<site url=\"http://example.com/?a=b&amp;b=c\" />";
xmlSAXUserParseMemory( handler,
NULL,
xml,
strlen(xml)
);
}
Thank you!
PS: this message is actually extracted from the LibXML2 list... and I am not the initial author of this mail, but I noticed the problem using Nokogiri and Aaron (the maintainer of Nokogiri) actually posted this message himself.