tags:

views:

76

answers:

4

Im new to python and regular expressions. Im searching a file line by line for the occurrence of ##random_string##, and i want to be able to capture the random_string in-between the ##s.

Ive tried both patterns but no luck =/

pattern1=r'[##]()[##]'
pattern2=r'\#{2}()\#{2}'

prog=re.compile(pattern1)
result=prog1.search(line)
if result:
  print result.group(0)

Thanks for any help =]

A: 

Your group is empty.

'##(.+?)##'
Ignacio Vazquez-Abrams
+5  A: 

Try using:

'##(.*?)##'

The problem with your regex is that you are trying to match an empty string between the ## using a (), you should be using .*? to match anything or a .+? to match any non-empty thing.

Your first regex [##]()[##] has an additional bug. A character class matches a single character, example: [ab] matches an a or b but not both.
So [##] does not match ##, in fact it's redundant to have duplicate characters in a character class, so [##] is same as [#] which is same as #.

Your second regex '\#{2}()\#{2}' is almost correct but for the empty match thing. Also note that a # is not a meta character ( like ., +, *) hence you need not escape it. So you can drop the \ in \#, but having it is not an error.

codaddict
very helpful. thanks a lot for the mini lesson =]
nubme
`##(.*?)##` equals `##(.*)##`
guilin 桂林
@ guilin: No, it does not. `.*?` is a non-greedy match, whereas `.*` is a greedy match.
Amber
For instance, if the string were `'##a## ##b##'`, the former would match twice (once for `a`, the other for `b`), whereas the latter would match only once (and consider the inner part to be `a## ##b`).
Amber
A: 

Or:

'##([^#]*)##'

(not tested)

rubik
A: 

If your line has multiple ##()## , what would be your output? ie, if there is overlapping of the patterns and you want to get those overlaps

>>> line="blah ## i want 1 ## blah blah ##  i want 2 ## blah"
>>> line.split("##")[1:-1]
[' i want 1 ', ' blah blah ', '  i want 2 ']

>>> line="blah ## i want 1 ## blah"
>>> line.split("##")[1:-1]
[' i want 1 ']

>>> line="blah ## i want 1 ## blah ## "
>>> line.split("##")[1:-1]
[' i want 1 ', ' blah ']
>>>

If you don't want overlapping,

>>> line="blah ## i want 1 ## blah ## i want ## "
>>> [i for n,i in enumerate(line.split("##")[1:]) if n%2==0]
[' i want 1 ', ' i want ']

>>> line="blah ## i want 1 ## blah "
>>> [i for n,i in enumerate(line.split("##")[1:]) if n%2==0]
[' i want 1 ']

>>> line="blah ## i want 1 ## blah ## iwant2 ## junk ## i want 3 ## ..."
>>> [i for n,i in enumerate(line.split("##")[1:]) if n%2==0]
[' i want 1 ', ' iwant2 ', ' i want 3 ']
>>>
ghostdog74