I want to scrape some information off a football (soccer) web page using simple python regexp's. The problem is that players such as the first chap, ÄÄRITALO, comes out as ÄÄRITALO!
That is, html uses escaped markup for the special characters, such as Ä
Is there a simple way of reading the html into the correct python string? If it was XML/XHTML it would be easy, the parser would do it.