You could use Linq2Xml to easily parse the code:
XElement doc = XElement.Parse(...)
Then correct the wrong attributes using a best-match algorithm against a valid attributes in-memory dictionary.
edit: I wrote and tested this simplified best-matched algorithm (sorry, it's VB):
Dim validTags() As String =
{
"width",
"height",
"img"
}
(simplified, you should create a more structured dictionary with tags and possible attributes for each tag)
Dim maxMatch As Integer = 0
Dim matchedTag As String = Nothing
For Each Tag As String In validTags
Dim match As Integer = checkMatch(Tag, source)
If match > maxMatch Then
maxMatch = match
matchedTag = Tag
End If
Next
Debug.WriteLine("matched tag {0} matched % {1}", matchedTag, maxMatch)
The above code calls a method to determine the percentage the source string equals any valid tag.
Private Function checkMatch(ByVal tag As String, ByVal source As String) As Integer
If tag = source Then Return 100
Dim maxPercentage As Integer = 0
For index As Integer = 0 To tag.Length - 1
Dim tIndex As Integer = index
Dim sIndex As Integer = 0
Dim matchCounter As Integer = 0
While True
If tag(tIndex) = source(sIndex) Then
matchCounter += 1
End If
tIndex += 1
sIndex += 1
If tIndex + 1 > tag.Length OrElse sIndex + 1 > source.Length Then
Exit While
End If
End While
Dim percentage As Integer = CInt(matchCounter * 100 / Math.Max(tag.Length, source.Length))
If percentage > maxPercentage Then maxPercentage = percentage
Next
Return maxPercentage
End Function
The above method, given a source string and a tag, finds the best match percentage comparing the single characters.
Given "widt" as input, it finds "width" as the best match with a 80% match value.