views:

62

answers:

6

I am close but I am not sure what to do with the restuling match object. If I do

p = re.search('[/@.* /]', str)

I'll get any words that start with @ and end up with a space. This is what I want. However this returns a Match object that I dont' know what to do with. What's the most computationally efficient way of finding and returning a string which is prefixed with a @?

For example,

"Hi there @guy"

After doing the proper calculations, I would be returned

guy
A: 

p.group(0) should return guy. If you want to find out what function an object has, you can use the dir(p) method to find out. This will return a list of attributes and methods that are available for that object instance.

Marc
doesn't seem to, `>>> str = "Jo there @guy">>> p = re.search('[/@.* /]', str)>>> p.group(0)' '`(the output is ' ')
tipu
I will add that you can find the documentation of the match object at http://docs.python.org/library/re.html#match-objects
Gautier Hayoun
+1  A: 

That regex does not do what you think it does.

s = "Hi there @guy"
p = re.search(r'@([^ ]+)', s) # this is the regex you described
print p.group(1) # first thing matched inside of ( .. )

But as usually with regex, there are tons of examples that break this, for example if the text is s = "Hi there @guy, what's with the comma?" the result would be guy,.

So you really need to think about every possible thing you want and don't want to match. r'@([a-zA-Z]+)' might be a good starting point, it literally only matches letters (a .. z, no unicode etc).

THC4k
You can use `\b`.
KennyTM
+3  A: 

The following regular expression do what you need:

import re
s = "Hi there @guy"
p = re.search(r'@(\w+)', s)
print p.group(1)

It will also work for the following string formats:

  • s = "Hi there @guy " # notice the trailing space
  • s = "Hi there @guy," # notice the trailing comma
  • s = "Hi there @guy and" # notice the next word
  • s = "Hi there @guy22" # notice the trailing numbers
  • s = "Hi there @22guy" # notice the leading numbers
Tendayi Mawushe
Depends on whether `'let's meet @11pm'` should get matched
THC4k
If numbers were significant (as well as words) how would that regex be modified?
tipu
Actually the \w pattern matches any alphanumeric character and the underscore, this is equivalent to the set [a-zA-Z0-9_] so @11pm for example would be matched correctly.
Tendayi Mawushe
A: 
(?<=@)\w+

will match a word if it's preceded by a @ (without adding it to the match, a so-called positive lookbehind). This will match "words" that are composed of letters, numbers, and/or underscore; if you don't want those, use (?<=@)[^\W\d_]+

In Python:

>>> strg = "Hi there @guy!"
>>> p = re.search(r'(?<=@)\w+', strg)
>>> p.group()
'guy'
Tim Pietzcker
A: 

As it's evident from the answers so far regex is the most efficient solution for your problem. Answers differ slightly regarding what you allow to be followed by the @:

[^ ] anything but space
\w   in python-2.x is equivalent to [A-Za-z0-9_], in py3k is locale dependent

If you have better idea what characters might be included in the user name you might adjust your regex to reflect that, e.g., only lower case ascii letters, would be:

[a-z]

NB: I skipped quantifiers for simplicity.

SilentGhost
A: 

You say: """If I do p = re.search('[/@.* /]', str) I'll get any words that start with @ and end up with a space."" But this is incorrect -- that pattern is a character class which will match ONE character in the set @/.* and space. Note: there's a redundant second / in the pattern. For example:

>>> re.findall('[/@.* /]', 'xxx@foo x/x.x*x xxxx')
['@', ' ', '/', '.', '*', ' ']
>>>

You say that you want "guy" returned from "Hi there @guy" but that conflicts with "and end up with a space".

Please edit your question to include what you really want/need to match.

John Machin