views:

4466

answers:

6

I'm cleaning an incoming text in my Java code. The text includes a lot of "\n", but not as in a new line, but literally "\n". I was using replaceAll() from the String class, but haven't been able to delete the "\n". This doesn't seem to work:

String string;
string = string.replaceAll("\\n", "");

Neither does this:

String string;
string = string.replaceAll("\n", "");

I guess this last one is identified as an actual new line, so all the new lines from the text would be removed.

Also, what would be an effective way to remove different patterns of wrong text from a String. I'm using regular expressions to detect them, stuff like HTML reserved characters, etc. and replaceAll, but everytime I use replaceAll, the whole String is read, right?

UPDATE: Thanks for your great answers. I' ve extended this question here:
Text replacement efficiency
I'm asking specifically about efficiency :D

+8  A: 

I think you need to add a couple more slashies...

String string;
string = string.replaceAll("\\\\n", "");

Explanation: The number of slashies has to do with the fact that "\n" by itself is a controlled character by Java.

So to get the real characters of "\n" somewhere we need to use "\n". Which if printed out with give us: "\"

You're looking to replace all "\n" in your file. But you're not looking to replace the control "\n". So you tried "\n" which will be converted into the characters "\n". Great, but maybe not so much. My guess is that the replaceAll method will actually create a Regular Expression now using the "\n" characters which will be misread as the control character "\n".

Whew, almost done.

Using replaceAll("\\n", "") will first convert "\\n" -> "\n" which will be used by the Regular Expression. The "\n" will then be used in the Regular Expression and actually represents your text of "\n". Which is what you're looking to replace.

hooknc
Thanks for your answer.Is there an explanation for so many slashies?
Fernando
Edited to add an explanation.
hooknc
+14  A: 

Hooknc is right. I'd just like to post a little explanation:

"\\n" translates to "\n" after the compiler is done (since you escape the backslash). So the regex engine sees "\n" and thinks new line, and would remove those (and not the literal "\n" you have).

"\n" translates to a real new line by the compiler. So the new line character is send to the regex engine.

"\\\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '\' and 'n', giving you the desired result.

Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.

MBCook
Good explanation. I'd also like to add that many people forget that the first argument in String.replaceAll() is a regular expression, not a literal string.
Marc Novakowski
+8  A: 

Instead of String.replaceAll(), which uses regular expressions, you might be better off using String.replace(), which does simple string substitution (if you are using at least Java 1.5).

String replacement = string.replace("\\n", "");

should do what you want.

Avi
Probably quicker too.
Michael Myers
Good idea. Just avoid the whole regex parsing and escaping since you're not using it.
MBCook
Great, thanks. I'm using this for the \n, but replaceAll for other patterns such as HTML tags and reserved characters. If you have any tips on more efficiency rather than repeating replaceAll for each pattern, it would be greatly apprecieated.
Fernando
If you are using the same regex more than once, you should probably compile a Pattern: Pattern pattern = Pattern.compile("some regex"); pattern.matcher(string).replaceAll("replacement"); - that way you avoid repeating the regex compilation, which is the expensive part.
Avi
Re efficiency: you could pre-compile patterns that you'll be reusing. I'd give an example but it's hard to do in a comment box.
Michael Myers
And Avi beat me to it anyway.
Michael Myers
+1  A: 

all there methods doesn't works for me at all ;|

simon
+1  A: 

The other answers have sufficiently covered how to do this with replaceAll, and how you need to escape backslashes as necessary.

Since 1.5., there is also String.replace(CharSequence, CharSequence) that performs literal string replacement. This can greatly simplify many problem of string replacements, because there is no need to escape any regular expression metacharacters like ., *, |, and yes, \ itself.

Thus, given a string that can contain the substring "\n" (not '\n'), we can delete them as follows:

String before = "Hi!\\n How are you?\\n I'm \n   good!";
System.out.println(before);
// Hi!\n How are you?\n I'm 
//   good!

String after = before.replace("\\n", "");

System.out.println(after);
// Hi! How are you? I'm 
//   good!

Note that if you insist on using replaceAll, you can prevent the ugliness by using Pattern.quote:

System.out.println(
    before.replaceAll(Pattern.quote("\\n"), "")
);
// Hi! How are you? I'm 
//   good!

You should also use Pattern.quote when you're given an arbitrary string that must be matched literally instead of as a regular expression pattern.

polygenelubricants
A: 

I use

raw = raw.replaceAll("\t", "");

raw = raw.replaceAll("\n", "");

raw = raw.replaceAll("\r", "");

Amit