views:

40

answers:

1

For a translation program I am trying to get a 95% accurate text from a HTML file in order to translate the sentences and links.

For example:

<div><a href="stack">Overflow</a> <span>Texts <b>go</b> here</span></div>

Should give me 2 results to translate:

Overflow

Texts <b>go</b> here

Any suggestions or commercial packages available for this problem?

A: 

I'm not exactly sure what you're asking, but look at simplehtmldom. Specifically the "Extract Contents from HTML" tab under quick start on that front page (can't link directly, sigh). With that you can extract the text of a website without all those pesky tags.

nstory