tags:

views:

93

answers:

2

So I have an input string which is a directory addres:

Example: ProgramFiles/Micro/Telephone

And I want to match it against a list of words very strictly:

Example: Tel|Tele|Telephone

I want to match against Telephone and not Tel. Right now my reg looks like this:

my( $output ) = ( $input =~ m/($list)/o );

The regex above will match against Tel. What can I do to fix it?

+2  A: 

If you want a whole word match:

\b(Tel|Tele|Telephone)\b

\b is a zero-width word boundary. Word boundary in this case means the transition from or to a word character. A word character (\w) is [0-9a-zA-Z_].

If you simply want to match against the longest in a partial word match put the longest first. For example:

\b(Telephone|Tele|Tel)

or

(Telephone|Tele|Tel)
cletus
My list cannot be guaranteed to have the longest word first.
syker
The word boundary works thought. But I cannot follow the reasoning as to why it worked.
syker
@syker if you want partial matches, the easiest way to construct the expression is to sort the list of words, reverse the order and then join all the words with `|` in between and that'll give the correct ordering of large words with smaller word matches.
cletus
Yea, I was thinking of the sort-and-reverse-order approach. I like the zero-width boundary approach a lot though. Are there any drawbacks to the zero-width boundary approach?
syker
Using the word boundaries (\b) would mean that, for example, `\b(Tel|Tele)\b` would not match anything in your original string.
Dean Harding
A: 

Change the orders: Tel|Tele|Telephone to Telephone|Tele|Tel. By the regexp algorithm, alternation is searched from left-to-right, if there found a match, that's it, no greedy matching. For example: /a|ab|abc/ working on "abc" matches "a" instead of the most greedy "abc".

or use the matching expressions.

Tel(?:e(?:phone)?)?
SHiNKiROU