ansaurus

Question

Regex to ensure group match doesn't end with a specific character

Answer 1

+2 A:

So the only real restriction on the last group is that it doesn’t contain a dot? Easy:

^(.*?)(\.[^.]+)$

This matches anything, non-greedily. The important part is the second group, which starts with a dot and then matches any non-dot character until the end of the string.

This works with all your test cases.

Konrad Rudolph 2010-05-19 15:14:35

Thanks, that looks good, nice and concise.

AJ 2010-05-19 18:15:58

Answer 2

+2 A:

I think this will do:

>>> regex = re.compile(r'^([0-9a-z.]+)\.(S[0-9]{2}E[0-9]{2}|[0-9]{3,4}|[0-9]{2}x[0-9]{2})$', re.I)
>>> regex.match('Name.Of.Show.01x01').groups()
('Name.Of.Show', '01x01')
>>> regex.match('Name.Of.Show.101').groups()
('Name.Of.Show', '101')

ETA: Of course, if you're just trying to extract different bits from trusted strings you could just use string methods:

>>> 'Name.Of.Show.101'.rpartition('.')
('Name.Of.Show', '.', '101')

SilentGhost 2010-05-19 15:15:11

Thanks, it never even crossed my mind to include the . outside both of the groups. I didn't show the entire string, there are usually other items after the episode #'s like "The.Name.Of.Show.S01E01.something.else", so rpartition wouldn't work.

AJ 2010-05-19 18:11:24

@AJ: then you should be careful not to include `$` into the regex

SilentGhost 2010-05-19 18:13:09

Answer 3

A:

If the last part never contains a dot: ^(.*)\.([^\.]+)$

Jan Willem B 2010-05-19 15:16:07

Answer 4

+1 A:

I believe this will do what you want:

^([0-9a-z\.]+)\.(?:S[0-9]{2}E[0-9]{2}|[0-9]{3,4}|[0-9]{2}(?:x[0-9]+)?)$

I tested this against the following list of shows:

30.Rock.S01E01
The.Office.0101
Lost.01x01
How.I.Met.Your.Mother.101

If those 4 cases are representative of the types of files you have, then that regex should place the show title in its own capture group and toss away the rest. This filter is, perhaps, a bit more restrictive than some others, but I'm a big fan of matching exactly what you need.

ABach 2010-05-19 15:18:37

Answer 5

+1 A:

It seems like the problem is that you haven't specified that the period before the last group is required, so something like ^([0-9a-zA-Z\.]+)\.(S[0-9]{2}E[0-9]{2}|[0-9]{4}|[0-9]{2}x[0-9]{2}|[0-9]{3}) might work.

Mark 2010-05-19 15:26:37

ansaurus

tags:

views:

answers:

Regex to ensure group match doesn't end with a specific character

related questions