I have a bunch of text files on disk, each one is just a sorted list of nonnegative integers, newline delimited:
2
14
67
134
654
1130
My files are pretty big - they can easily contain hundreds of millions of numbers. I want to write a Java program to read each of these files and write out a new text file containing all of the numbers merged together, in sorted order. I have a useful little gadget that helps me do this:
import java.util.Comparator;
import java.util.Iterator;
import java.util.PriorityQueue;
import java.util.Queue;
public abstract class Merger<Element> implements Comparator<Element> {
public static <Element extends Comparable<? super Element>> Iterable<Element> merge(
final Iterable<? extends Iterable<? extends Element>> iterables) {
return new Merger<Element>() {
@Override public int compare(final Element x, final Element y) { return x.compareTo(y); }
}.mergeIterables(iterables);
}
public final Iterable<Element> mergeIterables(
final Iterable<? extends Iterable<? extends Element>> iterables) {
return new Iterable<Element>() {
@Override public Iterator<Element> iterator() {
return new Iterator<Element>() {
class X implements Comparable<X> {
boolean b; private Element e; private final Iterator<? extends Element> i;
X(final Iterator<? extends Element> i) { getNext(this.i = i); }
@Override public int compareTo(final X h) { return compare(e, h.e); }
private <Y> Y getNext(final Y x) { e = (b = i.hasNext()) ? i.next() : null; return x; }
Element next() { return getNext(e); }
}
private final Queue<X> q = new PriorityQueue<X>(); {
for (final Iterable<? extends Element> i : iterables) {
final Iterator<? extends Element> j = i.iterator();
if (j.hasNext()) q.add(new X(j));
}
}
@Override public boolean hasNext() { return !q.isEmpty(); }
@Override public Element next() {
final X x = q.remove();
final Element e = x.next();
if (x.b) q.add(x);
return e;
}
@Override public void remove() { throw new UnsupportedOperationException(); }
};
}
};
}
}
To use this, I just wrote a bit of Java to give my text files an Iterable<Long> interface, and everything works as expected:
public static Iterable<Long> eachInt(final Reader r) {
return new Iterable<Long>() {
@Override public Iterator<Long> iterator() {
return new Iterator<Long>() {
private final BufferedReader b = r instanceof BufferedReader
? (BufferedReader) r
: new BufferedReader(r);
private Long l; { f(null); }
private <X> X f(final X x) {
try { l = Long.valueOf(b.readLine()); return x; }
catch (final IOException e) { throw new RuntimeException(e); }
}
@Override public boolean hasNext() { return l != null; }
@Override public Long next() { return f(l); }
@Override public void remove() { throw new UnsupportedOperationException(); }
};
}
};
}
My problem is that, although the code is correct, it's not as fast as i'd like. I suspect that IO is the bottleneck.
My question is this: is there any Java NIO-based solution for merging my files together?