ansaurus

Question

Python Regular Expression Matching: ## ##

Answer 1

+1 A:

'^#{2,}([^#]*)#{2,}' -- any number of # >= 2 on either end

be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. also lazy quantifiers are very slow

glebm 2010-10-23 01:17:25

I think he wants at least 2 at beginning *and* end.

Matthew Flaschen 2010-10-23 01:21:51

edit, thank you

glebm 2010-10-23 01:22:57

Answer 2

+3 A:

To match at least two hashes at either end:

pattern='##+(.*?)##+'

Marcelo Cantos 2010-10-23 01:17:59

im sorry i stated my question badly. i want to match EXACTLY ##<string>## and ignore the other ### at the beg or end (im calling re.sub and it will mess with my results).

nubme 2010-10-23 01:38:52

@nubme: I'm not sure what you mean. That's exactly what my answer does. I just tested it, and confirmed that it outputs `hey` and will only match if the string has at least two `#` characters at each end.

Marcelo Cantos 2010-10-23 01:43:32

@marcelo: sorry i edited my question, see if it makes more sense now.

nubme 2010-10-23 01:49:30

@nubme: I'm sorry, but you've confused me even more, now. The code has an error in it, and when I replace `line` with `string` (is that the right fix?) to make it work, I don't get either of the outputs you indicate.

Marcelo Cantos 2010-10-23 03:18:20

@marcelo: ya i mean string. thanks for your help i think i figured it out, but its kinda messy. #{2}([^#].?*)#{2}

nubme 2010-10-23 05:04:33

`#{2}` == `##`, and `[^#].?*` isn't what you want, since it means "one non-hash character followed by zero or more non-line-end characters." The `*` only refers to the previous token, `.?`. Generally, you shouldn't use `*`, `+`, or `?` consecutively; `*?` == `+?` == `?+` == `?*` == `*`.

Mike DeSimone 2010-10-23 13:07:11

Answer 3

A:

Try the "block comment trick": /##((?:[^#]|#[^#])+?)##/ Screenshot of working example

SHiNKiROU 2010-10-23 01:19:33

Answer 4

A:

Adding + to regex, which means to match one or more character.

pattern='#+(.*?)#+'
prog=re.compile(pattern)

string='###HEY##'
result=prog.search(string)
print result.group(1)

Output:

HEY

Tg 2010-10-23 01:21:35

Answer 5

A:

have you considered doing it non-regex way?

>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'

ghostdog74 2010-10-23 01:45:00

Answer 6

+2 A:

Your problem is with your inner match. You use ., which matches any character that isn't a line end, and that means it matches # as well. So when it gets ###hey##, it matches (.*?) to #hey.

The easy solution is to exclude the # character from the matchable set:

prog = re.compile(r'##([^#]*)##')

Protip: Use raw strings (e.g. r'') for regular expressions so you don't have to go crazy with backslash escapes.

Trying to allow # inside the hashes will make things much more complicated.

EDIT: If you do not want to allow blank inner text (i.e. "####" shouldn't match with an inner text of ""), then change it to:

prog = re.compile(r'##([^#]+)##')

+ means "one or more."

Mike DeSimone 2010-10-23 02:56:40

Answer 7

A:

>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>

ΤΖΩΤΖΙΟΥ 2010-10-24 13:13:17

ansaurus

tags:

views:

answers:

Python Regular Expression Matching: ## ##

related questions