tags:

views:

79

answers:

2

consider this string

prison break: proof of innocence (2006) {abduction (#1.10)}

i just want to know whether there is (# floating point value )} in the string or not

i tried few regular expressions like

re.search('\(\#+\f+\)\}',xyz) 

and

re.search('\(\#+(\d\.\d)+\)\}',xyz)

nothing worked though...can someone suggest me something here

+3  A: 

Try r'\(#\d+\.\d+\)\}'

The (, ), ., and } are all special metacharacters, that's why they're preceded by \, so they're matched literally instead.

You also need to apply the + repetition at the right element. Here it's attached to the \d -- the shorthand for digit character class -- to mean that only the digits can appear one-or-more times.

The use of r'raw string literals' makes it easier to work with regex patterns because you don't have to escape backslashes excessively.

See also


Variations

For instructional purposes, let's consider a few variations. This will show a few basic features of regex. Let's first consider one of the attempted patterns:

\(\#+(\d\.\d)+\)\}

Let's space out the parts for readability:

\( \#+ ( \d \. \d )+ \) \}
       \__________/
         this is one group, repeated with +

So this pattern matches:

  • A literal (, followed by one-or-more #
  • Followed by one-or-more of:
    • A digit, a literal dot, and a digit
  • Followed by a literal )}

Thus, the pattern will match e.g. (###1.23.45.6)} (as seen on rubular.com). Obviously this is not the pattern we want.

Now let's try to modify the solution pattern and say that perhaps we also want to allow just a sequence of digits, without the subsequent period and following digits. We can do this by grouping that part (…), and making it optional with ?.

BEFORE
\(#\d+\.\d+\)\}
      \___/
      let's make this optional! (…)?

AFTER
\(#\d+(\.\d+)?\)\}

Now the pattern matches e.g. (#1.23)} as well as e.g. (#666)} (as seen on rubular.com).

References

polygenelubricants
nice info..thanks for that links
bofh
+3  A: 

"Escape everything" and use raw-literal syntax for safety:

>>> s='prison break: proof of innocence (2006) {abduction (#1.10)}'
>>> re.search(r'\(\#\d+\.\d+\)\}', s)
<_sre.SRE_Match object at 0xec950>
>>> _.group()
'(#1.10)}'
>>> 

This assumes that by "floating point value" you mean "one or more digits, a dot, one or more digits", and is not tolerant of other floating point syntax variations, multiple hashes (which you appear from your RE patterns to want to support but don't mention in your Q's text), arbitrary whitespace among the relevant parts (again, unclear from your Q whether you need it), ... -- some issues can be adjusted pretty easily, others "not so much" (it's particularly hard to guess what gamut of FP syntax variations you want to support, for example).

Alex Martelli
You don't have to escape the `#`
NullUserException
yep that worked :)
bofh
@Null, if you escape every non-alphameric character that you want to match literally, you'll never go wrong, won't force your code's reader to guess which special characters have special meanings and which don't, and will survive if a future release of REs adds some functionality; there are no compensating advantages for NOT escaping it all. Similarly for the `r'...'` syntax: you **need** it only when some '\'-sequence you're using is otherwise a string escape, but using it **always** is a better idea - saves you from pondering deeply in every case, helps the reader, etc - no downside.
Alex Martelli
@Alex I find that the excess of backslashes makes the regex harder to read. And I don't think I've ever seen a regex flavor where `#` needs escaping. Have you, polygenelubricants?
NullUserException
It's a fair point though, I guess.
NullUserException
@Null: `#` becomes a metacharacter in `/x` mode (a.k.a. COMMENTS, IgnorePatternWhitespace, VERBOSE, PCRE_EXTENDED... why does *every* flavor have to invent its own name for this mode?). Anyway, the idea is to escape them all *because* you don't know which ones have special meanings. Not to worry, though; for every old hand advising n00bs to escape everything, there's always a dozen more complaining about all the useless backslashes, so it never lasts long. ;)
Alan Moore