tags:

views:

367

answers:

4

I need to match (case insensitive) "abcd" and an optional trademark symbol

Regex: "/abcd(™)?/gi"

See example:

<?php
preg_match("/abcd(™)?/gi","AbCd™  U9+",$matches);
print_r($matches);?>

When I run this $matches isn't populated with anything... not even created as an empty array. Any ideas?

Thanks.

+1  A: 

I suspect it has something to do with the literal trademark symbol.

You'll probably want to check out how to use Unicode with your regular expressions, and then embed the escape sequence for the trademark symbol.

theraccoonbear
+4  A: 

How is your file encoded? PHP has got issues when it comes to unicode. In your case, try using the escape sequence "\x99" instead of directly embedding the TM symbol.

Douglas Mayle
+1. I tried this pattern on regexpal.com: /abcd\u2122?/
ojrac
+2  A: 

Note: I'm not a PHP guru. However, this seems to be an issue about character encodings. For example, your PHP file could be encoded as win-1252 (where ™ is encoded as \x99), and the data you are trying to match could be encoded as UTF-8 (where ™ is encoded as '\xe2\x84\xa2'), or vice versa (i.e. your file is UTF-8 and your data is win-1252). Try looking in this direction, and give us more information about what you are doing.

ΤΖΩΤΖΙΟΥ
I don't mind if I lose reputation or the "chosen" answer status, really. I just would like to remind people that "up/downvoting" means "that was helpful/not helpful in relation to the question", not "I like/don't like your answer" or similar. And when you downvote, leave a comment why. It's proper.
ΤΖΩΤΖΙΟΥ
+1  A: 

It was a combination of things... this was the regex that finally worked:

/abcd(\xe2\x84\xa2)?/i

I had to remove the /g and change the tm symbol to \xe2\x84\xa2

Thanks!

Ken Sykora