ansaurus

Question

Regex to replace part of the string with spaces

Answer 1

+1 A:

what do you mean by nondigit vs anything?

[^a-zA-Z0-9]
matches everything that is not a letter or digit.

you would want to replace anything that gets matched by the above regex with a space.

is this what you were talking about?

Robert Greiner 2009-07-08 22:17:28

Don't you mean /[^a-zA-Z0-9]/ /g ?

BryanH 2009-07-08 22:19:31

that would delete the "anything" matches, I just wanted to throw the regex up that actually matches "anything" I will take the slashes out to clear things up. Thanks.

Robert Greiner 2009-07-08 22:21:28

'anything' means anything, i.e. letters, digits, whitespace. I want replace each occurrence with a space. For instance, 'AA12345d4 %' would be replaced with 'AA12345 ' (four spaces at the end)

vvs 2009-07-09 14:24:45

Answer 2

+1 A:

You want to use positive look behind to match the N's and D's then use a normal match for the A's.

Not sure of the positive look behind grammar in Java, but some article on Java regex with look behind

Simeon Pilgrim 2009-07-08 22:39:16

I was just about to post that ... honest! Don't know if you are allowed to have a variable length look behind pattern though eg (?<=\D+)

Amal Sirisena 2009-07-08 22:44:21

Not sure, about the Java regex: I've read some articles talking about pos/neg look ahead/behind restrictions in the three major variants of regex engines and the main take away I had was the the .Net regex could do the good stuff, but sometimes just because it can doesn't mean you should.

Simeon Pilgrim 2009-07-08 23:21:11

Here's a nice description of various engines' support for look behind: http://www.regular-expressions.info/lookaround.html#limitbehind

laz 2009-07-08 23:56:33

No, in general they do not allow variable-width look behind. "(?<=\D+)" is allowed because it is equivalent to the fixed-width look behind "(?<=\D)"

newacct 2009-07-09 02:16:41

And in any case, even if a look behind worked, it would not solve the OP's problem, which is to replace every character in the matched group with a space. There is no replacement string that will allow you to perform "replace this with a string of spaces of the same length".

newacct 2009-07-09 02:18:53

@newacct: good point.

Simeon Pilgrim 2009-07-09 02:59:30

Answer 3

+3 A:

Given your description, I'm assuming that after the NNDDDDD portion, the first A will actually be a N rather than an A, since otherwise there's no solid boundary between the DDDDD and AAAA portions.

So, your string actually looks like NNDDDDDNAAA, and you want to replace the NAAA portion with spaces. Given this, the regex can be rewritten as such: (\\D+\\d+)(\\D.+)

Positive lookbehind in Java requires a fixed length pattern; You can't use the + or * patterns. You can instead use the curly braces and specify a maximum length. For instance, you can use {1,9} in place of each +, and it will match between 1 and 9 characters: (?<=\\D{1,9}\\d{1,9})(\\D.+)

The only problem here is you're matching the NAAA sequence as a single match, so using "NNNDDDDNAAA".replaceAll("(?<=\\D{1,9}\\d{1,9})(\\D.+)", " ") will result in replacing the entire NAAA sequence with a single space, rather than multiple spaces.

You could take the beginning delimiter of the match, and the string length, and use that to append the correct number of spaces, but I don't see the point. I think you're better off with your original solution; Its simple and easy to follow.

If you're looking for a little extra speed, you could compile your Pattern outside the function, and use StringBuilder or StringBuffer to create your output. If you're building a large String out of all these NNDDDDDAAAAA elements, work entirely in StringBuilder until you're done appending.

class Test {

public static Pattern p = Pattern.compile("(\\D+\\d+)(\\D.+)");

public static StringBuffer replace( String input ) {
    StringBuffer output = new StringBuffer();
    Matcher m = Test.p.matcher(input);
    if( m.matches() )
        output.append( m.group(1) ).append( m.group(2).replaceAll("."," ") );

    return output;
}

public static void main( String[] args ) {
    String input = args[0];
    long startTime;

    StringBuffer tests = new StringBuffer();
    startTime = System.currentTimeMillis();
     for( int i = 0; i < 50; i++)
     {
      tests.append( "Input -> Output: '" );
      tests.append( input );
      tests.append( "' -> '" );
      tests.append( Test.replace( input ) );
      tests.append( "'\n" );
     }
    System.out.println( tests.toString() );
    System.out.println( "\n" + (System.currentTimeMillis()-startTime));
}

}

Update: I wrote a quick iterative solution, and ran some random data through both. The iterative solution is around 4-5x faster.

public static StringBuffer replace( String input )
{
    StringBuffer output = new StringBuffer();
 boolean second = false, third = false;
 for( int i = 0; i < input.length(); i++ )
 {
  if( !second && Character.isDigit(input.charAt(i)) )
   second = true;

  if( second && !third && Character.isLetter(input.charAt(i)) )
   third = true;

  if( second && third )
   output.append( ' ' );
  else
   output.append( input.charAt(i) );

 }

    return output;
}

Curtis Tasker 2009-07-09 01:21:08

Answer 4

+1 A:

I know you asked for a regex, but why do you even need a regex for this? How about:

StringBuilder sb = new StringBuilder(inputString);
for (int i = sb.length() - 1; i >= 0; i--) {
    if (Character.isDigit(sb.charAt(i)))
        break;
    sb.setCharAt(i, ' ');
}
String output = sb.toString();

You might find this post interesting. Of course, the above code assumes there will be at least one digit in the string - all characters following the last digit are converted to spaces. If there are no digits, every character is converted to a space.

Vinay Sajip 2009-07-09 05:35:17

I think you are right. I was refactoring some old code which has multiple loops and indexOf()/substring() and I thought it could be done with a simple regex. Didn't even think about cleaning up the old logic. I think your approach would be the most efficient for this task. Thanks for thinking outside the box, i.e. my initial requirements.

vvs 2009-07-09 14:30:44

Your code assumes that the AAA portion will be non-digits. This is contrary to the problem description, which says that A will be 'anything', which could include digits.

Curtis Tasker 2009-07-09 20:46:41

Well then, the solution can be slightly adapted to locate the point where a digit is followed by a non-digit. It still ends up being simpler than using regexes where they're not really necessary.

Vinay Sajip 2009-07-10 00:06:54

yes, I had to add additional logic to find the point where digits are allowed. Still pretty simple

vvs 2009-07-10 14:21:28

ansaurus

tags:

views:

answers:

Regex to replace part of the string with spaces

related questions