views:

49

answers:

3

Guys,

could you please advice how to verify in python if provided string correspond to provided pattern and return result.

For example the provided pattern is following:

< [prefix]-[id]> separated by ','>|< log >"

where prefix is any number of alphabetic characters, id is only numbers but not exceeding 5 digits, log is any number of any characters

examples:

  1. proj-123|log message
  2. proj-234, proj-345|log message

I suppose the easiest way is to apply regexp which I didn't use on python.

Thanks.

A: 

Python has great regexp library in stdlib. Here is documentation. Simply use re.match and that should be all you need.

gruszczy
+2  A: 
(?:[a-z]+-\d{1,5})(?:, [a-z]+-\d{1,5})*\|.*

it's not clear what you want to capture, that's why I use non-capturing groups. If you need only boolean:

>>> regex = '[a-z]+-\d{1,5}(?:, [a-z]+-\d{1,5})*\|.*'
>>> re.match(regex, 'proj-234, proj-345|log message') is not None
True

Of course, the same result can be achieved with the sequence of simple string methods:

prefs, _, log = subj.partition('|')
for group in prefs.split(', '):
    pref, _, id5 = group.partition('-')
    if id5.isdigit() and len(id5) <= 5 and pref.isalpha():
         print(pref, id5)
SilentGhost
...to be used with `re.match` (and not `re.search`)...
Tim Pietzcker
@Tim: seem to be working with `re.search` just fine
SilentGhost
That regex will actually match all strings. You need to escape the `|`, use `re.match` only, and change the `?` to `*` to allow more than two prefix-id groups. I'd also use `\s*` instead of a space.
interjay
thanks, interjay
SilentGhost
@SilentGhost: I would like just to get boolean if provided string matches pattern or not. If I understood correctly then I need to put your pattern into re.match right?
yart
A: 

Extending SilentGhosts' excellent regexp...

The following will match more than two comma separated tags and it captures the tags in one group and the log message in another group:

import re

line = 'proj-234,proj-345,proj-543|log message'
match = re.match('((?:[a-zA-Z]+-\d{1,5})(?:,[a-zA-Z]+-\d{1,5})+)\|(.*)', line)
tags = match.group(1).split(',')
log_msg = match.group(2)

I wasn't able to figure out if it was possible to capture the tags following the first tag without capturing the comma, so I decided to capture them in one group and split them after the fact.

liwp