tags:

views:

98

answers:

3

This is my first time working with regular expressions and I've been trying to get a regular expression working that would match the following:

  • apple
  • apple inc.
  • apple co.
  • apple corp.

but would not match:

  • inc. apple
  • co. apple
  • apple co. inc.
  • apple corp. inc.
  • apple inc. corp.
  • and so on...

This is what I got so far (apple)\s(inc|corp|co).$

Think you could help :)

EDIT: It needs to work in Java. Does java have it's own syntax for Regular Expressions?

+3  A: 

You're almost there:

^apple(?:\s(?:inc|co|corp)\.)?$

Note that if you want your regexp to be case insensitive, you either have to pass the CASE_INSENSITIVE flag when constructing the pattern or add (?i) to the pattern.

markusk
An alternate, and silly, way to get case insensitivity is to say [Aa][pP][pP][lL][eE], etc.
Chas. Owens
Agreed - it's possible, and it's silly. :-) Still, a nice hack if you're in a context where you can't pass the case insensitive option.
markusk
+1  A: 

Try something like this:

^apple\s?(inc|corp|co)?\.?$

Be careful with periods (.) as they are wildcards ( put a backslash in front of them).

? means not required

^ means beginning of the line

$ means end of the line

Look here for a more complete explanation: http://www.anaesthetist.com/mnm/perl/Findex.htm

Lathan
+1  A: 

Try this:

(?<!(?:inc|co|corp)\.\s)apple(?:\s(?:inc|co|corp)\.)?

It uses a negative lookbehind (?<! ) to prevent the prefixes, plus non-capturing groups (?: ) to discard unnecessary backreferences.

The . has been escaped to \. since it is otherwise a regex symbol meaning "any character".

The $ you used means end of line - so unless you only want this to match at the end of your string, you don't want it.
However, if you are searching specifically for the whole string being "apple inc." (etc) then you can keep the $ and replace the negative lookbehind with ^ to simplify the expression to:

^apple(?:\s(?:inc|co|corp)\.)?$
Peter Boughton