I have several log files of events (one event per line). The logs can possibly overlap. The logs are generated on separate client machines from possibly multiple time zones (but I assume I know the time zone). Each event has a timestamp that was normalized into a common time (by instantianting each log parsers calendar instance with the timezone appropriate to the log file and then using getTimeInMillis to get the UTC time). The logs are already sorted by timestamp. Multiple events can occur at the same time, but they are by no means equal events.
These files can be relatively large, as in, 500000 events or more in a single log, so reading the entire contents of the logs into a simple Event[] is not feasible.
What I am trying do is merge the events from each of the logs into a single log. It is kinda like a mergesort task, but each log is already sorted, I just need to bring them together. The second component is that the same event can be witnessed in each of the separate log files, and I want to "remove duplicate events" in the file output log.
Can this be done "in place", as in, sequentially working over some small buffers of each log file? I can't simply read in all the files into an Event[], sort the list, and then remove duplicates, but so far my limited programming capabilities only enable me to see this as the solution. Is there some more sophisticated approach that I can use to do this as I read events from each of the logs concurrently?