Given your description, I'm assuming that after the NNDDDDD
portion, the first A
will actually be a N
rather than an A
, since otherwise there's no solid boundary between the DDDDD
and AAAA
portions.
So, your string actually looks like NNDDDDDNAAA
, and you want to replace the NAAA
portion with spaces. Given this, the regex can be rewritten as such: (\\D+\\d+)(\\D.+)
Positive lookbehind in Java requires a fixed length pattern; You can't use the +
or *
patterns. You can instead use the curly braces and specify a maximum length. For instance, you can use {1,9}
in place of each +
, and it will match between 1 and 9 characters: (?<=\\D{1,9}\\d{1,9})(\\D.+)
The only problem here is you're matching the NAAA sequence as a single match, so using "NNNDDDDNAAA".replaceAll("(?<=\\D{1,9}\\d{1,9})(\\D.+)", " ")
will result in replacing the entire NAAA
sequence with a single space, rather than multiple spaces.
You could take the beginning delimiter of the match, and the string length, and use that to append the correct number of spaces, but I don't see the point. I think you're better off with your original solution; Its simple and easy to follow.
If you're looking for a little extra speed, you could compile your Pattern outside the function, and use StringBuilder or StringBuffer to create your output. If you're building a large String out of all these NNDDDDDAAAAA elements, work entirely in StringBuilder until you're done appending.
class Test {
public static Pattern p = Pattern.compile("(\\D+\\d+)(\\D.+)");
public static StringBuffer replace( String input ) {
StringBuffer output = new StringBuffer();
Matcher m = Test.p.matcher(input);
if( m.matches() )
output.append( m.group(1) ).append( m.group(2).replaceAll("."," ") );
return output;
}
public static void main( String[] args ) {
String input = args[0];
long startTime;
StringBuffer tests = new StringBuffer();
startTime = System.currentTimeMillis();
for( int i = 0; i < 50; i++)
{
tests.append( "Input -> Output: '" );
tests.append( input );
tests.append( "' -> '" );
tests.append( Test.replace( input ) );
tests.append( "'\n" );
}
System.out.println( tests.toString() );
System.out.println( "\n" + (System.currentTimeMillis()-startTime));
}
}
Update:
I wrote a quick iterative solution, and ran some random data through both. The iterative solution is around 4-5x faster.
public static StringBuffer replace( String input )
{
StringBuffer output = new StringBuffer();
boolean second = false, third = false;
for( int i = 0; i < input.length(); i++ )
{
if( !second && Character.isDigit(input.charAt(i)) )
second = true;
if( second && !third && Character.isLetter(input.charAt(i)) )
third = true;
if( second && third )
output.append( ' ' );
else
output.append( input.charAt(i) );
}
return output;
}