tags:

views:

44

answers:

4

Hi All,

i need a regex to replace a string.

<span class=\"Translation\" lang=\"ThisLanguage\">

with this one:

<span class=\"Translation\" lang=\"ThisLanguage\" onDblClick=\"window.external.MyFunction(ThisLanguage)\">

there are many languages in this string, each one contains a different "ThisLanguage"

anyone knows how it can be done??

I'm working with C# .Net

Thanks!

A: 

Parsing HTML with regex is like the 10th circle of hell. I kid you not. You'd be better off tidying it up (not sure if .NET has tidy) and then running it through an XML parser. That way you can pull out specific attributes like class and lang and then add a new attribute called onDblClick to your span node.

Otherwise, a naive approach (not sure what the syntax is in .NET but this is in Perl):

$str =~ s/<span\(.*?\)lang=\\"\(.*?\)\\">/<span$1lang=\\"$2\\" onDblClick=\\"window.external.MyFunction($2)\\">/

The important thing here is the pattern to match (including captures):

<span\(.*\)lang=\\"\(.*?\)\\">

This matches <span followed by anything, followed by lang=\" with anything between the \"s, followed by \">.

The replacement pattern is:

<span$1lang=\\"$2\\" onDblClick=\\"window.external.MyFunction($2)\\">

This creates <span followed by everything it matched up to the lang ($1) and then lang=\" followed by the language name it captured ($2), followed by the onDblClick stuff.

I am not familiar with .NET, so you will have to convert this. But it shouldn't be too different. You might have to change the \( to just ( (depending on the syntax). Also I'm not sure how .NET handles back-references but it should be $1 and $2 (like in Java).

Note: I have NOT tested this!

Vivin Paliath
+1  A: 

It's generally not advisable to parse HTML with regexps, since HTML is not regular and there are sufficient edge cases to trip up all but the most trivial scenarios. For all but the most trivial examples I would rather parse the HTML via an HTML parser and manipulate it via a suitable API (e.g. a DOM)

Brian Agnew
Dis-recommending regex for HTML parsing jobs is like beating a flour sack. You can beat on it all you like, it will never stop raising dust.
Tomalak
+1  A: 

A bit verbose but Expresso saves a lot of time!

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Thu, Mar 11, 2010, 04:37:21 PM
///  Using Expresso Version: 3.0.2766, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  <span.*?class="
///      <span
///      Any character, any number of repetitions, as few as possible
///      class="
///  [1]: A numbered capture group. [.*?]
///      Any character, any number of repetitions, as few as possible
///  ".*?lang="
///      "
///      Any character, any number of repetitions, as few as possible
///      lang="
///  [2]: A numbered capture group. [.*?]
///      Any character, any number of repetitions, as few as possible
///  ">
///      ">
///  
///
/// </summary>
public static Regex regex = new Regex(
      "<span.*?class=\"(.*?)\".*?lang=\"(.*?)\">",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace = 
      "<span class=\"$1\" lang=\"$2\" onDblClick=\"window.external."+
      "MyFunction(ThisLanguage)\">\r\n";


//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);

//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);

//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);

//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);

//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);

//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();

//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();
Lazarus
Don't you need to capture what's between `span` and `class`? I mean, if there *is* something? :). From the example it only looks like a space.
Vivin Paliath
thanks!!! thanks!!! :)
Lai
@Vivin, that's a fair comment (+1) although I think this was a fairly tight requirement. So replacing the search regex with "<span(.*?)class=\"(.*?)\"(.*?)lang=\"(.*?)\"(.*?)>" and the replacement string with "<span$1class=\"$2\"$3lang=\"$4\"$5 onDblClick=\"window.external.MyFunction(ThisLanguage)\">" should do it.
Lazarus
A: 

I wouldn't use a regex. I would use jQuery.

// set the lang value to ThisLanguage
$('span.Translation').attr('lang', 'ThisLanguage'); 
// add the onDblClick event with the value
$('span.Translation').attr('onDblClick', 'window.external.MyFunction(ThisLanguage)'); 

Or if you are simply generating strings (Which you may not be but IF you are this could work) and pushing them out why not do this?

string spanTag = String.Format("<span class=\"Translation\" lang=\"{0}\" onDblClick=\"window.external.MyFunction({0})\">", "ThisLanguage");
gmcalab
There is a jQuery port for C#? Nice! ;-)
Tomalak