ansaurus

Question

RegEx a RegEx match

Answer 1

+3 A:

You cannot do this in one regex, you'll need two:

First take all matches that are between single quotes:

'[\d#]+'

Then over all those matches, do this:

#\d+

So you'll end up with something like (in C#):

foreach(var m in Regex.Matches(inputString, @"'[\d#]+'"))
{
    foreach(var m2 in Regex.Matches(m.Value, @"#\d+"))
    {
          yield return m2.Value;
    }
}

Jan Jongboom 2010-09-14 13:05:38

Too bad that it isn't possible in one RegEx, guess this'll have to do. Thanks for typing it out for me aswell ;)

Willy 2010-09-14 13:17:25

Gnarf posted an answer that does it in one RegEx, thank though!

Willy 2010-09-14 13:22:23

Answer 2

+1 A:

Assuming you can use lookbehind/lookaheads and that your regexp supports variable length lookbehinds (JGSoft / .NET only)

(?<='[#0-9]*)#\d+(?=[#0-9]*')

Should work... Tested it using this site and got these results:

Breaking it down is pretty simple:

(?<=        # Start positive lookbehind group - assure that the text before the cursor
            # matches the following pattern: 
  '         # Match the literal '
  [#0-9]*   # Matches #, 0-9, zero or more times
)           # End lookbehind...
#\d+        # Match literal #, followed by one or more digits
(?=         # Start lookahead -- Ensures text after cursor matches (without advancing)
  [#0-9]*   # Allow #, 0-9, zero or more times
  '         # Match a literal '
)

So, this pattern will match #\d+ if the text before it is '[#0-9]* and the text after is [#0-9]*'

gnarf 2010-09-14 13:18:05

Wow, that works perfect! Exactly what I was looking for. Could you explain what this does exactly? Thanks alot :)

Willy 2010-09-14 13:20:53

You sir, are KING!

Willy 2010-09-14 13:42:03

@Willy - Honestly though -- I voted for @Jan's answer.. It is WAY easier to understand what you are doing there...

gnarf 2010-09-14 13:43:56

You are correct sir. It IS alot easier to understand, but I wanted to do it in one RegEx if possible, which is what your method does :). Which method would be faster and better performance wise?

Willy 2010-09-14 13:56:16

@Willy - It's hard to say which method will perform better (especially since I don't have a .NET compiler), you should setup some sort of profiling testing to see...

gnarf 2010-09-14 18:56:02

Answer 3

+2 A:

As you don't specify a language, here is a solution in perl :

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $s = qq!Blaa lablalbl balbla balb lbal '#39'blaaaaaaaa'#39' ('#39#226#8218#172#39') blaaaaaaaa #7478347878347834 blaaaa blaaaa!;

my @n = $s =~ /(?<=['#\d])(#\d+)(?=[#'\d])/g;

print Dumper(\@n);

Output :

$VAR1 = [
          '#39',
          '#39',
          '#39',
          '#226',
          '#8218',
          '#172',
          '#39'
        ];

M42 2010-09-14 13:20:55

I had no idea that RegEx was language specific, the RegEx bit works universally right? This does the trick aswell, #\d+(?=#|'). Thanks! Your RegEx is alot shorter then the one Gnarf posted, what are the differences?

Willy 2010-09-14 13:24:00

His only tests that the character following the match is a `#` or `'` -- and not all regular expressions can handle lookahead, lookbehind, etc. If you put a `#` after `#7478347878347834` in your test string, it would then match that as well...

gnarf 2010-09-14 13:30:16

Tested, you are right :)

Willy 2010-09-14 13:42:35

@gnarf: Yes, you're right, i've updated the regex, adding a lookbehind of fix length because variable length lookaround isn't allowed in perl and in some other languages.

M42 2010-09-14 13:58:08

ansaurus

tags:

views:

answers:

RegEx a RegEx match

related questions