views:

154

answers:

3

I'm new to HTML coding and I know HTML has some reserved characters for its use and it also displays some characters by their character code. For example -:

Œ  is   Œ
©  is   ©
®  is    ®

I have the HTML source in std::string. how can i decipher them into their actual form and replace from std::string? is there any library with source available or can it be done using macros preprocessors?

+2  A: 

I would recommend using some HTML/XML parser that can automatically do the conversion for you. Parsing HTML correctly by hand is extremely difficult. If you insist on doing it yourself, Boost String Algorithms library provides useful replacement functions.

Tronic
A: 

One method for the numeric entities would be to use a regular expression like &#([0-9]+);, grab the numeric value and convert it to the ASCII character (probably with sprintf in C++).

For the named entities you would need to build a mapping. You could probably do a simple string replace to convert to the numbers, then use the method above. W3C has a table here: http://www.w3.org/TR/WD-html40-970708/sgml/entities.html

But if you're trying to read or parse a bunch of HTML in a string, you should use an HTML parser. Search for the many questions on SO.

DisgruntledGoat
A: 
Ms2ger
I picked it from here http://www.web-source.net/symbols.htm
Dave18
Blame Microsoft for making Windows-1252.
KennyTM