tags:

views:

436

answers:

2

I'm interested in unescaping text for example: '\' -> '\' in C. Anyone knows of a good library?

By html escape I mean, all the entity references.

+6  A: 

I had some free time today and wrote a decoder from scratch: entities.c, entities.h.

The only function with external linkage is

size_t decode_html_entities_utf8(char * dest, const char * src);

If src is a null pointer, the string will be taken from dest, ie the entities will be decoded in-place. Otherwise, the decoded string will be put in dest - which should point to a buffer big enough to hold strlen(src) + 1 characters - and src will be unchanged.

The function will return the length of the decoded string.

Please note that I haven't done any extensive testing, so there's a high probability of bugs...

Christoph
No, I mean all the entity references; I've updated the question with a link to them.
felipec
@christoph, pinch them from the source I linked.
Aiden Bell
and +1 for being a nice guy!
Aiden Bell
Hmm, I was looking for a library. Your code looks good, but the string handling makes it a bit complicated.
felipec
A: 

I wrote my own unescape code; very simplified, but does the job: pn_util.c

felipec