views:

45

answers:

2

Simple example: we have string "Some sample string Of Text". And I want to filter out all stop words (i.e. "some" and "of") but I don't want to change letter case of other words which should be retained.

If letter case was unimportant I would do this:

str.toLowerCase().replaceAll ("a|the|of|some|any", "");

Is there an "ignore case" solution with regular expressions in java?

+5  A: 

You can use the inline case-insensitive modifier:

str.replaceAll ("(?i)a|the|of|some|any", "");
Tomalak
I'll try. If I understand correctly, "inline case-insensitive modifier" is `(?i)`, am I right?
Roman
@Roman: Exactly. There are others, too: http://www.regular-expressions.info/java.html (Scroll down to "Using The Pattern Class").
Tomalak
you can also do partial case insensitivity IIRC `"(?i:a)bc"` is equivalent (in my Locale) to "[aA]bc"
KitsuneYMG
@kts: thanks, I think only a few people know about this possibility.
Roman
@Tomalak: Why not just link directly to the Pattern javadoc? :) http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html
Esko
@Esko: Because I knew *you* would do it, so I waited. ;-)
Tomalak
+4  A: 

Something like this should do the trick as well:

        Pattern pat = Pattern.compile("a|the|of|some|any", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pat.matcher(str);
        String result = matcher.replaceAll("");
npinti