views:

473

answers:

2

This question is very similar to that one, but I need to do the same thing in C, not python. Here are some examples of what the function should do:

input    output

&lt;     <
&gt;     >
&auml;   ä
&#x00DF; ß

The function should have the signature char *html2str(char *html) or similar. I'm not reading byte by byte from a stream.

Is there a library function I can use?

A: 

This sounds like a job for flex. Granted, flex is usually stream-based, but you can change that using the flex function yy_scan_string (or its relatives). For details, see The flex Manual: Scanning Strings.

Flex's basic Unicode support is pretty bad, but if you don't mind coding in the bytes by hand, it could be a workaround. There are probably other tools that can do what you want, as well.

JXG
A: 
Jonathan Leffler