tags:

views:

74

answers:

5

Hi all. Got some trouble with my preg_match. The code.

$text = "tel: 012 213 123. mobil: 0303 11234 \n address: street 14";
$regex_string = '/(tel|Tel|TEL)[\s|:]+(.+)[\.|\n]/';

preg_match($regex_string , $text, $match);

And I get this result in $match[2]

"012 213 123. mobil: 023 123 123"

First question. I want the regex to stop at the .(dot) but it doesent. Can someone explain to why it isnt?

Second question. preg_match uses () to get their match. Is it possible to skip the parentheses surrounding the different "Tel" and still get the same functionality?

Thnx all stackoverflow is great :D

+1  A: 

This should do:

/tel(?:\s|:)+([^.]+)(?:\.|$)/i

+ is a greedy quantifier, which means it'll match as many characters as possible.

To your second question: in this particular case you just need to use case-insensitive match (i flag). Generally, you could use (?:...) syntax, example of which you could see in the end match. Square brackets are used for character classes.

SilentGhost
If this is a greedy quantifier problem, what's `[\.|\n]` matching?
LeguRi
ok... didnt know about the (?:...) syntax... very helpful thnx.I guess I dont really grasp the character classes yet.
Yo-L
@Richard: end of line of line character?
SilentGhost
@Yo-L: they're just groups of characters. any of the character would be matched as many times as quantifier allows.
SilentGhost
Seems so easy when u all explain it.Harder to get it right just reading a table of instructions.ty again.
Yo-L
+1  A: 

If you're simply trying to extract a phone number out of that line, and it's guaranteed to be 11 numbers, you could simply use this:

$text = 'tel: 012 213 123. mobil: 0303 11234';
$phone_number = substr(preg_replace('/[^\d]/', '', $text), 0, 11);`

With your example, $phone_number would be 0122131230.

How this works is any non-digit is replaced with an empty string, removing it, and then the first 11 numbers are returned.

ryeguy
Helpful but its not gonna be a static nomber of digits.
Yo-L
@Yo-L: So is it going to be the same format but just a varying amount of numbers? eg, a phone number without an area code?
ryeguy
+1  A: 

No idea - your regex works for me (I get "012 213 123" in $match[2] with your code). The fact that the mobile phone differs between the two might indicate that it's not really the output of your code; check again.

Some other things - if you happen to have more dots in the line ("tel: xxx. phone: xxx. fax: xxx" for example), you will get bad results - use non-greedy operators ("get least chunk that matches" .*? instead of "get biggest chunk that matches" .*) or limit the repeated characters ("any number of non-periods" [^.]*). Also, you could spare yourself the trouble by making the regex case-insensitive (unless you really hate people typing "tEl").

Your other question: (?:stuff) will match "stuff" just like (stuff), but will not capture it.

Useful link: http://www.regular-expressions.info/

Amadan
I got it abit wrong posting the question but Silentghost got it right anyway :SGot that page up in another tab :) Still abit hard to grasp.
Yo-L
+1  A: 

Why do you have pipes in your character classes [\.|\n] and [\s|:]? Character classes (stuff in square brackets []) are by definition like an OR relationship, so you don't need the pipe... unless you really are trying to match pipe |.

As for question #1, I'm not sure what's cusiong your problem, but usually this has to do with greedy quantifiers. The (.+) quantifier is greedy, so it matches as much as it can while still matching the entire pattern. Greedy quantifiers don't care what comes after them in the pattern. Since a period . matches any character other than new line characters, it can match a period, and so it does match a period. To make a quantifier non-greedy you can use a question mark ?.

For your second question In RegEx uses parenthesis to group things and to store them. If you want to group (tel|Tel|TEL) but not store it in $match you can put a ?: at after the open parenthesis:

(?:tel|Tel|TEL)
LeguRi
Yeah I got the pipe messed up in there.
Yo-L
+1  A: 

Do you mean you want to match only the number, so you don't have to strip off the tel: and the dot? Try this:

/tel[:\s]+\K[^.]+/i

The i makes it case-insensitive.

[:\s] matches a colon or whitespace (the | doesn't mean "or" in a character class, it just matches a |).

[^.]+ matches one or more non-dots; it stops matching when it sees a dot or the end of the line, so you don't have to match the dot if you don't want it in the result.

Finally, \K means "forget about whatever you've matched so far and pretend the match really started here"--a little gem of a feature that's only available in Perl and PHP (that I know of).

Alan Moore
Thats great info.. ty man Never heard of the \K
Yo-L