tags:

views:

77

answers:

4

I am looking to replace \n with \\n but so far my regex attempts are not working (Really it is any \ by itself, \n just happens to be the use case I have in the data).

What I need is something along the lines of:

any-non-\ followed by \ followed by any-non-\

Ultimately I'll be passing the regex to java.lang.String.replaceAll so a regex formatted for that would be great, but I can probably translate another style regex into what I need.

For example I after this program to print out "true"...

public class Main
{
    public static void main(String[] args)
    {
        final String original;
        final String altered;
        final String expected;

        original = "hello\nworld";
        expected = "hello\\nworld";
        altered  = original.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");
        System.out.println(altered.equals(expected));
   }
}

using this does work:

    altered  = original.replaceAll("\\n", "\\\\n");
+2  A: 

The string should be

"[^\\\\]\\\\[^\\\\]"

You have to quadruple backslashes in a String constant that's meant for a regex; if you only doubled them, they would be escaped for the String but not for the regex.

So the actual code would be

myString = myString.replaceAll("([^\\\\])\\\\([^\\\\])", "$1\\\\$2");

Note that in the replacement, a quadruple backslash is now interpreted as two backslashes rather than one, since the regex engine is not parsing it. Edit: Actually, the regex engine does parse it since it has to check for the backreferences.

Edit: The above was assuming that there was a literal \n in the input string, which is represented in a string literal as "\\n". Since it apparently has a newline instead (represented as "\n"), the correct substitution would be

myString = myString.replaceAll("\\n", "\\\\n");

This must be repeated for any other special characters (\t, \r, \0, \\, etc.). As above, the replacement string looks exactly like the regex string but isn't.

Michael Myers
didn't do it... Ive updated the question with a simple bit of code
TofuBeer
yes, but I don't want to just have it work for \n as there could be \(anything) and I want that to be \\(anything). The only example I have, so far, in the data is t \n but I'd rather not limit the code to \n.
TofuBeer
@TofuBeer: My point is, those characters do not actually have a backslash--the backslashes are only in the Java representation. They have nothing else in common. For example, `\n` is equivalent to `\u000A` and `\t` is equivalent to `\u0009` (the complete list is at http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6).
Michael Myers
well the issue is a bit different though too... the \ can appear anywhere in the JSON data not just the Java specific ones. Looking further I did verify that \ in JSON is to be encoded as \\. For now I;ll just handle them as they come up and work with the data provider to properly encode the JSON objects.
TofuBeer
@TofuBeer: I guess I still don't understand the requirements, then. Are there backslashes or are there not?
Michael Myers
Here is an example error JSON string (comes from an HTTP server): "highlight":"Shaun White Captures Olympic Gold with Double McTwist 1260:King of the mountain.\nIn a stunning display of twists... http://bit.ly/9uGdVo". The \n is supposed to be \\n bu the server sends the wrong thing. So on my end I want to replace all of the \n with \\n before I process it.
TofuBeer
@TofuBeer: How are you viewing the contents of the string? Are you printing it out or using a debugger?
Michael Myers
Hmmm... trying to get a more realistic reproducible case isn't working out. The \n came from the JSON file saved to disk. At this point I'm just going to punt, go with what I have, and try and get the data source fixed since that is the real problem. I'd rather not have to massage the data at all.
TofuBeer
A: 

I don't know exactly what you need it for, but you could have a look at StringEscapeUtils from Commons Lang. They have plenty of methods doing things like that, and if you don't find exactly what you're searching for, you could have a look at the source to find inspiration :)

Valentin Rocher
+1  A: 

So whenever there is 1 backslash, you want 2, but if there is 2, 3 or 4... in a row, leave them alone?

you want to replace

(?<=[^\\])\\(?!\\+)([^\\])

with

\\$1

That changes the string

hello\nworld and hello\\nworld and hello\\\nworld

into

hello\\nworld and hello\\nworld and hello\\\nworld
Chad
Yes, 1 back slash becomes 2, 2 ore more stays the same. I'll give it a shot.
TofuBeer
@Tofu, I'm no Java guy, but I believe you will need to escape the backslashes in the strings you use for creating your regex. As, unlike c#, Java doesn't have verbatim strings (http://www.javacamp.org/javavscsharp/string.html)
Chad
A: 

Whats wrong with using altered = original.replaceAll("\\n", "\\\\n"); ? That's exactly what i would have done.

kukudas