tags:

views:

102

answers:

4

I have a bunch of strings which may of may not have random symbols and numbers in them. Some examples are:

contains(reserved[j])){

close();

i++){

letters[20]=word

I want to find any character that is NOT a letter, and replace it with a white space, so the above examples look like:

contains reserved j

close

i

letters word

What is the best way to do this?

+2  A: 
yourInputString = yourInputString.replaceAll("[^\\p{Alpha}]", " ");

^ denotes "all characters except"

\p{Alpha} denotes all alphabetic characters

See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for details

aioobe
Actually, he want to get rid of numbers/digits as well. The `\p{Alnum}` would cover them as well.
BalusC
Thank you. I updated the reply.
aioobe
+3  A: 

It depends what you mean by "not a letter", but assuming you mean that letters are a-z or A-Z then try this:

s = s.replaceAll("[^a-zA-Z]", " ");

If you want to collapse multiple symbols into a single space then add a plus at the end of the regular expression.

s = s.replaceAll("[^a-zA-Z]+", " ");
Mark Byers
+1 for the collapse recommendation.
BalusC
+1  A: 

I want to find any character that is NOT a letter

That will be [^\p{Alpha}]+. The [] indicate a group. The \p{Alpha} matches any alphabetic character (both uppercase and lowercase, it does basically the same as \p{Upper}\p{Lower} and a-zA-Z. The ^ inside group inverses the matches. The + indicates one-or-many matches in sequence.

and replace it with a white space

That will be " ".

Summarized:

string = string.replaceAll("[^\\p{Alpha}]+", " ");

Also see the java.util.regex.Pattern javadoc for a concise overview of available patterns. You can learn more about regexs at the great site http://regular-expression.info.

BalusC
A: 

Use the regexp /[^a-zA-Z]/ which means, everything that is not in the a-z/A-Z characters

In ruby I would do:

"contains(reserved[j]))".gsub(/[^a-zA-Z]/, " ")
 => "contains reserved j   "

In Java should be something like:

import java.util.regex.*;
...

String inputStr = "contains(reserved[j])){";
String patternStr = "[^a-zA-Z]";
String replacementStr = " ";

// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);

// Replace all occurrences of pattern in input
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll(replacementStr);
duncan