This is a follow up to another question of mine. The solution I found worked great for every one of the test cases I threw at it, until a case showed up that eluded me the first time around.
My goal is to reformat improperly formatted tag attributes using regex (I know, probably not a fool-proof method as I'm finding out, but bear with me).
My functions:
Public Function ConvertMarkupAttributeQuoteType(ByVal html As String) As String
Dim findTags As String = "</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>"
Return Regex.Replace(html, findTags, AddressOf EvaluateTag)
End Function
Private Function EvaluateTag(ByVal match As Match) As String
Dim attributes As String = "\s*=\s*(?:(['""])(?<g1>(?:(?!\1).)*)\1|(?<g1>\S+))"
Return Regex.Replace(match.Value, attributes, "='$2'")
End Function
The regex in the EvaluateTag
function will correctly transform HTML like
<table border=2 cellpadding='2' cellspacing="1">
into
<table border='2' cellpadding='2' cellspacing='1'>
You'll notice I'm forcing attribute values to be surrounded by single quotes -- don't worry about that. The case that it breaks on is if the last attribute value doesn't have anything around it.
<table width=100 border=0>
comes out of the regex replace as
<table width='100' border='0>'
with the last single quote incorrectly outside of the tag. I've confessed before that I'm not good at regex at all; I just haven't taken the time to understand everything it can do. So, I'm asking for some help adjusting the EvaluateTag
regex so that it can handle this final case.
Thank you!