tags:

views:

49

answers:

4

I'm new to regular expressions in php.

I have a long string of html. I want to find all occurences of:

@any_username_after_an_at_sign

Could somebody help me recover all of the usernames on a page? I think you use preg_match but I don't know the regular expression to use.

Thanks!!

A: 

Try this:

@\S+

and use preg_match_all

AntonioCS
+1  A: 

You could try:

/@\w+/

But this might pick up some false matches, such as parts of email addresses. Can you tell us something about the context?

It might also be relevant to consider using an HTML parser, although without more information it is hard to be sure.

Mark Byers
The context is actually a twitter-like microblogging profile page with status updates. So it's like searching thorugh www.twitter.com/ev/
chris
@chris: Then it definitely sounds like you ought to be using a parser for this and not regex. Chances are that there is some markup telling you what the username is and what the message is. If you can parse that markup then you can get the username more reliably than with a regex.
Mark Byers
@chris, you should add that to the question, it's a very important piece of information
gnibbler
@mark: good point. i will look into this.
chris
@chris: From this thread: http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php you can use DOMDocument::loadHTML http://docs.php.net/manual/en/domdocument.loadhtml.php
Mark Byers
@mark: thanks again
chris
+1  A: 

Simple:

preg_match_all('~@(\w+)\b~', '@me @you', $usernames);
print_r($usernames);

Result:

Array (
  [0] => Array(
    [0] => @me
    [1] => @you
  )
  [1] => Array (
    [0] => me
    [1] => you
  )
)

Once you get this, simply match these against your users' DB table to weed out false positives. You might also want to strip_tags() before you do this to avoid getting text from inside attributes.

Max Shawabkeh
A: 

Given the context of the twitter page, something like this may work.

'@<a class="tweet-url username"[^>]*>([^<]*)</a>'

but a proper parser will always work better than a regex for this type of problem

gnibbler