Hello to all, I'm doing a html text feature extractor in C++; the program need to be REALLY fast: i need to extract a this features in ms per html page and the memory usage needs to be good and finally unicode encoding well be nice.
I know how difficult is to have all of this things, but i want a parser close to these things at least.
Somebody have a suggestion?