ansaurus

Question

What kind of regex would I use to match this?

Answer 1

+1 A:

Why do you want to use groups or look behinds at all? What is wrong with re.search('TAG\[.*@(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]')?

Frank 2010-06-30 18:22:23

This regex will return the whole section: TAG[[email protected]], when called with r.group(). I need it so r.group() only returns the ip_address

anon-user 2010-06-30 18:24:59

Sorry, forgot the opening parenthesis before the first \d. I edited it, and it should be correct now.

Frank 2010-06-30 18:27:38

Shouldn't those be `{1,3}`, not `{1-3}`?

JAB 2010-06-30 18:33:35

This will still return the whole TAG[[email protected]] string if I am not mistaken.

anon-user 2010-06-30 18:35:04

Yes, I corrected it. Thank you for finding this.

Frank 2010-06-30 18:35:40

Answer 2

+1 A:

I don't think it's possible to do that - r.group() will always return the whole string that matched, so you're forced to use lookbehind, which as you say must be fixed width.

Instead, I'd suggest modifying the script that you're writing. I'm guessing that you have a whole load of regexps that it matches, and you don't want to have to specify for each one "this one uses r.group(0)", "this one uses r.group(3)" etc.

In that case, you could use Python's named groups facility: you can name a group in a regular expression like this:

(?P<name>CONTENTS)

then retrieve what matched with r.group("name").

What I suggest doing in your script is: match the regular expression, then test if r.group("usethis") is set. If so - use that; if not - then use r.group() as before.

That way you can cope with awkward situations like this by specifying the group name usethis in the regexp - but your other regexps don't have to know or care.

psmears 2010-06-30 18:27:59

The problem is exactly as you mentioned. I do not want to specify that this 'tag' uses r.group(0) and this other 'tag' uses r.group(3). I have thought about using python's name facility which from looking at the responses seems to be the best option.

anon-user 2010-06-30 18:37:16

Answer 3

+1 A:

Try re.search('(?<=@)\d\d\.\d\d\.\d\d\.\d\d(?=\])', line).

In fact, re.search('\d\d\.\d\d\.\d\d\.\d\d', line) may get you what you need if the only occurrence of the xx.xx.xx.xx format in the strings being checked is in those IP address sections.

EDIT: As stated in my comment, to find all occurrences of the wanted pattern in a string, you just do re.findall(pattern_to_match, line). So in this case, re.findall('\d\d\.\d\d\.\d\d\.\d\d', line) (or more generally, re.findall('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', line)).

EDIT 2: From your comment, this should work (with tagname being the tag of the IP address you currently want).

r = re.search(tagname + '\[.+?@(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', line)

And then you'd just refer to it with r.group("ip") like psmears said.

...In fact, there's an easy way to make the regex a bit more concise.

r = re.search(tagname + r'\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line)

In fact, you could even do this:

r = re.findall('(?P<tag>\S+)\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line)

Which would return you a list containing the tags and their associated IP addresses, and so you wouldn't have to recheck any one string once you found the matches if you wanted to refer to the IP address of a different tag from the same string.

...In fact, going two steps further (farther?), you could do the following:

r = dict((m.group("tag"), m.group("ip")) for m in re.finditer('(?P<tag>\S+)\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line))

Or in Python 3:

r = {(m.group("tag"), m.group("ip")) for m in re.finditer('(?P<tag>\S+)\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line)}

And then r would be a dict with the tags as keys and the IP addresses as the respective values.

JAB 2010-06-30 18:30:55

The problem is there are multiple occurences of @xx.xx.xx.xx in the string

anon-user 2010-06-30 18:33:03

In that case you just use `re.findall(pattern)`

JAB 2010-06-30 18:35:10

My apologies. I was not clear enough in the question. The string will look something like this: some_text TAG1[[email protected]] some_text TAG2[[email protected]] some_text. I need it to find say just yy.yy.yy.yy.

anon-user 2010-06-30 18:39:19

Ah, I see. updated my answer again, then.

JAB 2010-06-30 18:53:47

Answer 4

A:

Almost but I think that you need to change the .* at the start to .*? since you may have multiple TAGs on a single line (I believe - as there is in the example)

re.search('TAG(\d+)\[.*?@(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})]')

The Tag ID will be in the first backreference and the IP address will be in the second back reference

Jonathan Stanton 2010-07-01 17:24:55

ansaurus

tags:

views:

answers:

What kind of regex would I use to match this?

related questions