views:

38

answers:

1

I have a bunch of text files on disk, each one is just a sorted list of nonnegative integers, newline delimited:

2
14
67
134
654
1130

My files are pretty big - they can easily contain hundreds of millions of numbers. I want to write a Java program to read each of these files and write out a new text file containing all of the numbers merged together, in sorted order. I have a useful little gadget that helps me do this:

import java.util.Comparator;
import java.util.Iterator;
import java.util.PriorityQueue;
import java.util.Queue;

public abstract class Merger<Element> implements Comparator<Element> {

  public static <Element extends Comparable<? super Element>> Iterable<Element> merge(
    final Iterable<? extends Iterable<? extends Element>> iterables) {
    return new Merger<Element>() {
      @Override public int compare(final Element x, final Element y) { return x.compareTo(y); }
    }.mergeIterables(iterables);
  }

  public final Iterable<Element> mergeIterables(
    final Iterable<? extends Iterable<? extends Element>> iterables) {
    return new Iterable<Element>() {
      @Override public Iterator<Element> iterator() {
        return new Iterator<Element>() {

          class X implements Comparable<X> {
            boolean b; private Element e; private final Iterator<? extends Element> i;
            X(final Iterator<? extends Element> i) { getNext(this.i = i); }
            @Override public int compareTo(final X h) { return compare(e, h.e); }
            private <Y> Y getNext(final Y x) { e = (b = i.hasNext()) ? i.next() : null; return x; }
            Element next() { return getNext(e); }
          }

          private final Queue<X> q = new PriorityQueue<X>(); {
            for (final Iterable<? extends Element> i : iterables) {
              final Iterator<? extends Element> j = i.iterator();
              if (j.hasNext()) q.add(new X(j));
            }
          }

          @Override public boolean hasNext() { return !q.isEmpty(); }

          @Override public Element next() {
            final X x = q.remove();
            final Element e = x.next();
            if (x.b) q.add(x);
            return e;
          }

          @Override public void remove() { throw new UnsupportedOperationException(); }

        };
      }
    };
  }

}

To use this, I just wrote a bit of Java to give my text files an Iterable<Long> interface, and everything works as expected:

public static Iterable<Long> eachInt(final Reader r) {
    return new Iterable<Long>() {
        @Override public Iterator<Long> iterator() {
            return new Iterator<Long>() {
                private final BufferedReader b = r instanceof BufferedReader
                    ? (BufferedReader) r
                    : new BufferedReader(r);
                private Long l; { f(null); }
                private <X> X f(final X x) {
                    try { l = Long.valueOf(b.readLine()); return x; }
                    catch (final IOException e) { throw new RuntimeException(e); }
                }
                @Override public boolean hasNext() { return l != null; }
                @Override public Long next() { return f(l); }
                @Override public void remove() { throw new UnsupportedOperationException(); }
            };
        }
    };
}

My problem is that, although the code is correct, it's not as fast as i'd like. I suspect that IO is the bottleneck.

My question is this: is there any Java NIO-based solution for merging my files together?

A: 

How are you doing the I/O now?

EJP
I have an adapter that takes a Reader and exposes it as an Iterable of Longs.
betelgeuse_supernova
Well make sure there is a BufferedReader in the stack, and make sure there is a BufferedWriter in the output stack. And give them both large buffer sizes.
EJP