It is possible to do this in-place efficiently. You'd have to overwrite the characters in the buffer at the proper intervals, and you'd then logically shorten the buffer with setLength
. It's going to be quite complex, but it would be in-place and O(N)
.
The reason why you'd want to overwrite instead of using delete
/insert
is because that would be O(N^2)
instead, because things need to be shifted around unnecessarily.
Doing this out-of-place is quite trivial and O(N)
but would require a secondary buffer, doubling the space requirement.
Proof-of-concept
Here's a simple proof-of-concept. removeIntervals
takes a StringBuffer
and an int[][] intervals
. Each int[]
is assumed to be a pair of { start, end }
values (half-open range, exclusive upper bound). In linear time and in-place, these intervals are removed from the StringBuffer
by a simple overwrite
. This works when intervals are sorted and non-overlapping, and processed left-to-right.
The buffer is then shortened with setLength
, cutting off as many characters that were removed.
static void overwrite(StringBuffer sb, int dst, int srcFrom, int srcTo) {
for (int i = srcFrom; i < srcTo; i++) {
sb.setCharAt(dst++, sb.charAt(i));
}
}
static int safeGet(int[][] arr, int index, int defaultValue) {
return (index < arr.length) ? arr[index][0] : defaultValue;
}
static void removeIntervals(StringBuffer sb, int[][] intervals) {
int dst = safeGet(intervals, 0, 0);
int removed = 0;
for (int i = 0; i < intervals.length; i++) {
int start = intervals[i][0];
int end = intervals[i][1];
int nextStart = safeGet(intervals, i+1, sb.length());
overwrite(sb, dst, end, nextStart);
removed += end - start;
dst += nextStart - end;
}
sb.setLength(sb.length() - removed);
}
Then we can test it as follows:
String text = "01234567890123456789";
int[][][] tests = {
{ { 0, 5, },
}, // simple test, removing prefix
{ { 1, 2, }, { 3, 4, }, { 5, 6, }
}, // multiple infix removals
{ { 3, 7, }, { 18, 20, },
}, // suffix removal
{
}, // no-op
{ { 0, 20 },
}, // remove whole thing
{ { 7, 10 }, { 10, 13 }, {15, 15 },
}, // adjacent intervals, empty intervals
};
for (int[][] test : tests) {
StringBuffer sb = new StringBuffer(text);
System.out.printf("> '%s'%n", sb);
System.out.printf("- %s%n", java.util.Arrays.deepToString(test));
removeIntervals(sb, test);
System.out.printf("= '%s'%n%n", sb);
}
This prints (as seen on ideone.com):
> '01234567890123456789'
- [[0, 5]]
= '567890123456789'
> '01234567890123456789'
- [[1, 2], [3, 4], [5, 6]]
= '02467890123456789'
> '01234567890123456789'
- [[3, 7], [18, 20]]
= '01278901234567'
> '01234567890123456789'
- []
= '01234567890123456789'
> '01234567890123456789'
- [[0, 20]]
= ''
> '01234567890123456789'
- [[7, 10], [10, 13], [15, 15]]
= '01234563456789'
Getting the intervals
In this specific case, the intervals can either be built in a preliminary pass (using indexOf
), or the whole process can be done in one pass if absolutely required. The point is that this can definitely be done in-place in linear time (and if absolutely necessary, in a single-pass).
An out-of-place solution
This is out-of-place using a secondary buffer and regex. It's offered for consideration due to its simplicity. Unless further optimization is provably required (after evidentiary profiling results), this should be good enough:
String text =
"DEL: line1\n" +
"KEP: line2\r\n" +
"DEL: line3\n" +
"KEP: line4\r" +
"DEL: line5\r" +
"DEL: line6\r" +
"KEP: line7\n" +
"DEL: line8";
StringBuffer sb = new StringBuffer(text);
Pattern delLine = Pattern.compile("(?m)^DEL:.*$");
String cleanedUp = delLine.matcher(sb).replaceAll("<deleted!>");
System.out.println(cleanedUp);
This prints (as seen on ideone.com):
<deleted!>
KEP: line2
<deleted!>
KEP: line4
<deleted!>
<deleted!>
KEP: line7
<deleted!>
References