I have an app that is parsing the html page and extracts some text with foreign characters for example 'Felvidék Ma'. Now I want to enter this into my database but not in this format but the original format. So can I convert it into utf 8 before writing to a sql server database or even writing to a textfile. here is the original term 'Felvidék Ma'. I use regex expressions to parse the html so Im not sure if there is an option to assist with this. Here is my code:
If Not String.IsNullOrEmpty(_html) Then
'get all href tags in the html page
Dim regex As Regex = New Regex( _
"<TotalFound>(?<link>.*?)</TotalFound>", _
RegexOptions.IgnoreCase _
Or RegexOptions.CultureInvariant _
Or RegexOptions.IgnorePatternWhitespace _
Or RegexOptions.Compiled _
)
Dim ms As MatchCollection = regex.Matches(_html)
Dim url As String = String.Empty
For Each m As Match In ms
url = m.Groups("link").Value
If Not String.IsNullOrEmpty(url) Then
I found the source of my problem. it was when fetching the html page and reading the stream. I changed default encoding to UTF 8 and all is well now. Thanks again.
Dim reader As StreamReader = New StreamReader(responseStream, Encoding.Default)
returnContent = reader.ReadToEnd()