tags:

views:

28

answers:

1

Hello. I need to find and replace substring with dot in it. It's important to keep search strict to word boundaries (\b). Here's an example script to reproduce (i need to match "test."):

<?php
# 1.php
$string = 'test. lorem ipsum';
if(!preg_match('~\btest\.\b~i', $string)) echo 'no match 1' . PHP_EOL;
if(!preg_match('~\btest\b\.~i', $string)) echo 'no match 2' . PHP_EOL;

And here's output:

x:\>php 1.php
no match 1

x:\>php -v
PHP 5.2.8 (cli) (built: Dec  8 2008 19:31:23)
Copyright (c) 1997-2008 The PHP Group

BTW, I also don't get any match if there're square brackets in search pattern. I do escape them of course, but still no effect.

+2  A: 

Regexes can't read; they don't really know what a "word" is. To them, a word boundary is simply a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one:

(?<=\w)(?!\w)|(?=\w)(?<!\w)

So the position after the . in first first test would only be a word boundary if it were followed by another word character ([A-Za-z0-9_]; in some regex flavors the definition is based on a broader range of characters, including accented English letters and letters from other scripts, but in PHP it's only ASCII letters and digits).

I suspect what you want to do is make sure the . is either followed by whitespace, or it's at the end of the string. You can express that directly as a positive lookahead:

'~\btest\.(?=\s|$)~i'

...or more succinctly, as a negative lookahead:

'~\btest\.(?!\S)~i'

...in other words, if there's a next character, it's not a non-whitespace character.

Alan Moore
Thanks for correction of my answer ;-)
zerkms
Thanks for your reply, now I get it right. As for my original task, I need to search exact pieces of text, and dot comes as part of contraction. For example: "sm. AEL", "some name [info]", etc.