views:

79

answers:

7

Im searching a file line by line for the occurrence of ##random_string##. It works except for the case of multiple #...

pattern='##(.*?)##'
prog=re.compile(pattern)

string='lala ###hey## there'
result=prog.search(string)

print re.sub(result.group(1), 'FOUND', string)

Desired Output:

"lala #FOUND there"

Instead I get the following because its grabbing the whole ###hey##:

"lala FOUND there"

So how would i ignore any number of # at the beg or end, and only capture "##string##".

+1  A: 

'^#{2,}([^#]*)#{2,}' -- any number of # >= 2 on either end

be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. also lazy quantifiers are very slow

glebm
I think he wants at least 2 at beginning *and* end.
Matthew Flaschen
edit, thank you
glebm
+3  A: 

To match at least two hashes at either end:

pattern='##+(.*?)##+'
Marcelo Cantos
im sorry i stated my question badly. i want to match EXACTLY ##<string>## and ignore the other ### at the beg or end (im calling re.sub and it will mess with my results).
nubme
@nubme: I'm not sure what you mean. That's exactly what my answer does. I just tested it, and confirmed that it outputs `hey` and will only match if the string has at least two `#` characters at each end.
Marcelo Cantos
@marcelo: sorry i edited my question, see if it makes more sense now.
nubme
@nubme: I'm sorry, but you've confused me even more, now. The code has an error in it, and when I replace `line` with `string` (is that the right fix?) to make it work, I don't get either of the outputs you indicate.
Marcelo Cantos
@marcelo: ya i mean string. thanks for your help i think i figured it out, but its kinda messy. #{2}([^#].?*)#{2}
nubme
`#{2}` == `##`, and `[^#].?*` isn't what you want, since it means "one non-hash character followed by zero or more non-line-end characters." The `*` only refers to the previous token, `.?`. Generally, you shouldn't use `*`, `+`, or `?` consecutively; `*?` == `+?` == `?+` == `?*` == `*`.
Mike DeSimone
A: 

Try the "block comment trick": /##((?:[^#]|#[^#])+?)##/ Screenshot of working example

SHiNKiROU
A: 

Adding + to regex, which means to match one or more character.

pattern='#+(.*?)#+'
prog=re.compile(pattern)

string='###HEY##'
result=prog.search(string)
print result.group(1)

Output:

HEY
Tg
A: 

have you considered doing it non-regex way?

>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'
ghostdog74
+2  A: 

Your problem is with your inner match. You use ., which matches any character that isn't a line end, and that means it matches # as well. So when it gets ###hey##, it matches (.*?) to #hey.

The easy solution is to exclude the # character from the matchable set:

prog = re.compile(r'##([^#]*)##')

Protip: Use raw strings (e.g. r'') for regular expressions so you don't have to go crazy with backslash escapes.

Trying to allow # inside the hashes will make things much more complicated.

EDIT: If you do not want to allow blank inner text (i.e. "####" shouldn't match with an inner text of ""), then change it to:

prog = re.compile(r'##([^#]+)##')

+ means "one or more."

Mike DeSimone
A: 
>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>
ΤΖΩΤΖΙΟΥ