Parsing HTML with regex is like the 10th circle of hell. I kid you not. You'd be better off tidy
ing it up (not sure if .NET has tidy) and then running it through an XML parser. That way you can pull out specific attributes like class
and lang
and then add a new attribute called onDblClick
to your span
node.
Otherwise, a naive approach (not sure what the syntax is in .NET but this is in Perl):
$str =~ s/<span\(.*?\)lang=\\"\(.*?\)\\">/<span$1lang=\\"$2\\" onDblClick=\\"window.external.MyFunction($2)\\">/
The important thing here is the pattern to match (including captures):
<span\(.*\)lang=\\"\(.*?\)\\">
This matches <span
followed by anything, followed by lang=\"
with anything between the \"
s, followed by \">
.
The replacement pattern is:
<span$1lang=\\"$2\\" onDblClick=\\"window.external.MyFunction($2)\\">
This creates <span
followed by everything it matched up to the lang
($1
) and then lang=\"
followed by the language name it captured ($2
), followed by the onDblClick
stuff.
I am not familiar with .NET, so you will have to convert this. But it shouldn't be too different. You might have to change the \(
to just (
(depending on the syntax). Also I'm not sure how .NET handles back-references but it should be $1
and $2
(like in Java).
Note: I have NOT tested this!