tags:

views:

390

answers:

1
     1 <span class='Txt9Gray'>Decisions ( </span>

I'm trying to grab the '1' from this string. Before the '1' is another span, but I can't use that as a marker because it can change from page to page. Is there any regex expression that can simply grab the '1'.

The word 'Decisions' will always exist. That's my main way to find this line. Here's what I have been trying to no avail:

  strRegex.Append("(?<strDecisionWins>[^<]+)[\s]*?
  <span class='[\s\w\W]*'>\bDecisions\b \([\s\w\W]*?</span>")

This keeps grabbing the spans before the actual '1'. The full line containing the above is:

<span class='Txt9Gray'>(T)KOs ( </span> 66.67 <span class='Txt9Gray'>%) </span> <br /> 1 <span class='Txt9Gray'>Decisions ( </span> 33.33 <span class='Txt9Gray'>%) </span> <br />

The problem is that the match is matching the very beginning, instead of the one piece.

+1  A: 

How about:

\d+(?=\s*\<[^\>]+\>[^\<]*\bDecisions\b)
\d+(?=\s*<[^>]+>[^<]*\bDecisions\b)

That would only select 1 (and nothing else)

The second form is for regex processor which does not need to escape < and >.

The lookahead expression (?=...) guarantees to select a number \d+ followed by an element () containing a text (meaning no opening '<': [^<]*), which includes the word Decisions.

The lookahead technique can be combined with other regex like:

\s\d(?=\s*\<[^\>]+class\s*=\s*'Txt9Gray'[^\>]*\>)
\s\d(?=\s*\<[^>]+class\s*=\s*'Txt9Gray'[^>]*>)

would grab a single digit (provided it follows a space), followed by an element containing the attribute 'class='Txt9Gray''

VonC
Lieven
@Lieven: good point: I propose now both forms
VonC