ansaurus

Question

BEGINNER: REGEX Match numeric sequence except where the word "CODE" exists on a line.

Answer 1

+2 A:

(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?

Commented version:

(?<!                 # Begin zero-width negative lookbehind. (Makes sure the following pattern can't match before this position)
(?:                  # Begin non-matching group
[Pp]asscode          # Either Passcode or passcode
|                    # OR
[Cc]ode              # Either Code or code
)                    # End non-matching group
.*                   # Any characters
)                    # End lookbehind
[0-9]{7,10}          # 7 to 10 digits
(?:                  # Begin non-matching group
-[0-9]{2}            # dash followed by 2 digits
)                    # End non-matching group
?                    # Make last group optional

Edit: final version after comment discussion -

/^(?!\D*(?:[Pp]asscode|[Cc]ode))\D*([0-9]{7,10}(?:-[0-9]{2})?)/

(result in first capture buffer)

Amber 2009-12-23 20:53:01

Nicely done! Only thing I would add is `:?` after `(?:[Pp]asscode|[Cc]ode)`.

Matthew 2009-12-23 20:58:53

Nice on the commented version. The `//x` modifier is *always* your friend (though I would condense it down a little - the "begin/end non-matching group"s seem a little excessive).

Anon. 2009-12-23 21:04:46

@Dav: When I use your regex in perl as:if(m{(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?})I get:Variable length lookbehind not implemented in regex;Am I missing somthing?

codaddict 2009-12-23 21:07:25

The "excessive commenting" is mostly just due to posting on SO. Not the kind of commenting I'd use in my own code. :) But I figure for SO, more information is better than less, since there's no assumption on what any particular reader might know.

Amber 2009-12-23 21:08:06

Oh, bzabhi - you might need to modify the `.*` in the lookbehind; your definition of "comes before" for the passphrase bit was a bit vague.

Amber 2009-12-23 21:08:51

Dude, your commenting is perfect for us wobbly regex wannabe's! I love the delimiting breakdown of what each part does! Sadly though, this solution isn't working for me. :-(

Murdoch Ripper 2009-12-23 21:13:22

Oh right, lookbehind with variable quantifiers is tricky. You want to re-write it with a lookahead instead: `/(?!\D*(?:[Pp]asscode|[Cc]ode))\D*[0-9]{7,10}(?:-[0-9]{2})?/`

Anon. 2009-12-23 21:13:32

Erm, anchor that pattern at the start.

Anon. 2009-12-23 21:14:12

The look ahead assertion works better, however, now it will match the numbers and everything BEFORE it also. I just need to match the numbers. How can you place boundaries for the numbers?

Murdoch Ripper 2009-12-23 21:28:49

Look ahead with word boundaries: (?!\D*(?:[Pp]asscode|[Cc]ode))\b\D?[0-9]{7,10}(?:-[0-9]{2})?\b

Murdoch Ripper 2009-12-23 21:31:13

The question says you just need to match all lines that meet the requirements. If you need to extract the number itself with the same regex, use a capturing group around the parts you want: `/^(?!\D*(?:[Pp]asscode|[Cc]ode))\D*([0-9]{7,10}(?:-[0-9]{2})?)/` Then the numbers themselves will be in the first capture buffer.

Anon. 2009-12-23 21:31:47

just put a capture group around the numbers by adding parentheses around the part you want to match, and then look at the captured group text instead of the entire match text.

Amber 2009-12-23 21:32:03

Beautimus! This works for me!!

Murdoch Ripper 2009-12-23 21:37:05

Although, if the tool you're looking at doesn't allow you to look at capture buffers, you might have an issue there. How well-defined is the location of your passphrase text?

Amber 2009-12-23 21:37:10

Great! Glad to hear things worked out.

Amber 2009-12-23 21:38:05

@Murdoch Ripper: Can you tell us which solution works? The best i could find on this thread fails for 'level1: 01234567' (it doesn't match, but it should).

Mark Byers 2009-12-23 21:43:42

The regex I posted will fail if there are any digits not part of the number preceding it. You could probably adjust it a little to use word boundaries and `.` instead of `\D`, which would solve this.

Anon. 2009-12-23 21:49:15

@Anon: I still don't think that it would work in all cases. I think using a variable width lookahead is not a suitable approach. You risk looking ahead too far and giving a false negative in the special case I mentioned in the comments. I've already provided a solution that works below.

Mark Byers 2009-12-23 22:04:32

Answer 2

+1 A:

You can get by with a nasty regex you have to get help with ...

... or you can use two simple regexes. One that matches what you want, and one that filters what you don't want. Simpler and more readable.

Which one would you like to read?

$foo =~ /(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?/

or

$foo =~ /\d{7,10}(-\d{2})?/ and $foo !~ /(access |pass)code/i;

Edit: case-insensitivity.

Alex Brasetvik 2009-12-23 21:02:21

Thanks for the comment. I'm stuck with the nasty solution I suppose. Although you're right - having two is better in this case - it won't appease those higher beings called "share holders". This is due to having a software solution which does not accept the "filter" regex. Do you have an example? I could give it a shot, but in testing much simpler cases thus far, it hasn't worked well if at all.

Murdoch Ripper 2009-12-23 21:10:50

Your example is what I was asking for. The term "it" = using two simple regexes.

Murdoch Ripper 2009-12-23 21:17:41

The first version isn't PCRE and the second version doesn't do what he wants.

Mark Byers 2009-12-23 22:46:52

ansaurus

tags:

views:

answers:

BEGINNER: REGEX Match numeric sequence except where the word "CODE" exists on a line.

related questions