views:

76

answers:

4

I have a string containing several parameters, e.g.

PARAM1="someValue", PARAM2="someOtherValue"...

For log-output I want to "hide" some of the parameter's values, i.e. replace them with ***.

I use the following regex to match the parameter value, which works fine for most cases:

(PARMANAME=")[\w\s]*"

However, this regex only matches word- and whitespace-characters. I want to extend it to match all characters between the two quotation marks. The problem is, that the value itself can contain (escaped) quotation marks, e.g.:

PARAM="the name of this param is \"param\""

How can I match (and replace) that correctly?

My Java-method looks like this:

/**
 * @param input input string
 * @param params list of parameters to hide
 * @return string with the value of the parameter being replace by ***
 */
public static String hideParamValue(String input, final String... params)
{
    for (String param : params)
    {
        input = input.replaceAll("(" + param + "=)\\\"[\\w\\s]*\\\"", "$1***");
    }
    return input;
}
+1  A: 

Try this regular expression:

PARAM="(?:[^"\\]|\\")*"

This only a allows a sequence of either any character except " and \ or a \". If you want to allow other escape sequences than just \", you can extend it with \\["rnt…] for example to also allow \r, \n, \t, etc.

Gumbo
+1  A: 

You have to add the scaped double quotes to your mathing characters expression:

[\w\s\\"] instead of [\w\s] which escaped in your String will result as [\\w\\s\\\\\"] instead of [\\w\\s]

Thus, the final code will result as

/**
 * @param input input string
 * @param params list of parameters to hide
 * @return string with the value of the parameter being replace by ***
 */
public static String hideParamValue(String input, final String... params) {
    for (String param : params)
    {
        input = input.replaceAll("(" + param + "=)\\\"[\\w\\s\\\\\"]*\\\"", "$1***");
    }
    return input;
}
Tomas Narros
+1  A: 

A negative lookbehind may be useful in this case:

(PARAMNAME=").*?(?<!\\)"

that is

s.replaceAll("(" + param + "=)\".*?(?<!\\\\)\"", "$1***");

(?<!\\)" means " not preceded by \, so that .*?(?<!\\)" means the shortest possible (due to reluctant *?) sequence of any characters terminated by " where " is not preceded by \.

axtavt
+3  A: 

Escaped quotes are a real PITA in Java, but this should do the trick:

public class Test
{
  public static String hideParamValue(String input, final String... params)
  {
    for (String param : params)
    {
      input = input.replaceAll(
        "(" + param + "=)\"(?:[^\"\\\\]|\\\\.)*\"",
        "$1***");
    }
    return input;
  }

  public static void main(String[] args)
  {
    String s = "PARAM1=\"a b c\", PARAM2=\"d \\\"e\\\" f\", PARAM3=\"g h i\"";
    System.out.println(s);
    System.out.println(hideParamValue(s, "PARAM2", "PARAM3"));
  }
}

output:

PARAM1="a b c", PARAM2="d \"e\" f", PARAM3="g h i"
PARAM1="a b c", PARAM2=***, PARAM3=***

[^\"\\\\] matches any one character other than a quotation mark or a backslash. The backslash has to be escaped with another backslash for the regex, then each of those has to be escaped for the string literal. But the quotation mark has no special meaning in a regex, so it only needs one backslash.

(?:[^\"\\\\]|\\\\.) matches anything except a quotation mark or a backslash, OR a backslash followed by anything. That takes care of your escaped quotation marks, and also allows for escaped backslashes and other escape sequences, at no extra cost.

The negative-lookbehind approach suggested by @axtavt only handles escaped quotes, and it treats \\" as a backslash followed by an escaped quote, when it was probably intended as an escaped backslash followed by a quote.

Alan Moore