views:

408

answers:

8

Hi There,

I am reading log files but not all lines want to be processed straight away. I am using a queue / buffer to store the lines while they wait to be processed.

This queue is regularly scanned for particular lines - when they are found, they are removed from the queue (they can be anywhere in it). When there isn't a particular line to be found, lines are taken out of the start of the queue one by one to be processed.

Therefore, the queue needs the following:

  • Able to be resized (or give that impression)
  • Have elements removed from anywhere
  • Have elements added (will always be at the end of the queue)
  • Be scanned quickly
  • Depending on performance, have a pointer of where it got to on the last scan.

I initially wrote the code when I had little experience of Java or the API, and just used an ArrayList because I knew it would work (not necessarily because it was the best option).

Its performance is now becoming poor with more and more logs needing to be processed - so, what collection would you recommend to be used in this situation? There's always the possibility of writing my own too.

Thanks

+4  A: 

A LinkedList would probably be most appropriate. It has all the requested properties, and allows links to be removed from the middle in constant time, rather than the linear time required for an ArrayList.

If you have some specific strategy for finding the next element to remove, a PriorityQueue or even a sorted set might be more appropriate.

Avi
Wouldn't Linked List be slow for searching for the elements to be removed?
Mario Ortegón
That would be one of the down sides to a LinkedList, potentially slow searching
James Camfield
Searching in a linked list depends on the type of searching. Going through everything is fairly easy, and removals are trivial.
deterb
It depends; yes searching would be slow(ish), but if he's potentially removing several items with each search through, the cheaper removes would be worth the slower searching.If the targets for removal are rare and sparse, on the other hand, it might not be the best solution.
Adam Jaskiewicz
+2  A: 

Scanned quickly generally implies a hash-based implementation of some sort, a ConcurrentSkipListMap might be a good implementation. Log(n) on the containskey, remove and get methods, and is sorted so you can have some sort of priority associated with it.

Spencer K
A: 

Because you need to remove and add elements from the set, and search for specific values, maybe a better structure could be something that implements SortedSet, such as TreeSet. This class guarantees log(n) performance for add, remove and contains.

Mario Ortegón
A: 

I guess some threads are going to write to the queue and another one will read from it.

In this case you should look at the queues in the java.lang.concurrent package.

You may use a PriorityBlockingQueue to let it order the elements for you, or a LinkedBlockingQueue if you want to iterate over it and choose yourself the elements to remove.

pgras
+1  A: 

I don't want to sort the lines being read (they need to be kept in their original order). However, I could potentially block the lines based on a session ID that each logged line has (several logged lines per session).

Thinking about it, I could potentially have a:

HashMap<String,LinkedList<String>>

and provide the session id as the key, and populate the LinkedList with the lines belonging to the session.

The Map would provide a quick way to search for lines to do with session X, and then the linked list would provide the best performance to add / remove lines (the searching performance was to find lines to do with session x, therefore the actual lines to do with session x can be read and removed from start to finish - pushed / popped).

Is there a better collection than the linked list which would resize, have lines added at the end and always taken from the beginning? I believe the Queue collection extends the linked list anyway?

James Camfield
+6  A: 

LinkedHashSet might be of interest. It is effectively a HashSet but it also maintains a LinkedList to allow a predictable iteration order - and therefore can also be used as a FIFO queue, with the nice added benefit that it can't contain duplicate entries.

Because it is a HashSet too, searches (as opposed to scans) can be O(1) if they can match on equals()

Bill Michell
This gives the best of both worlds. Thanks for making me aware of this collection, I'd never have thought of it otherwise :0)
James Camfield
I made this collection myself repeatedly before it was added to the SDK, it's amazingly useful (and to code it yourself out of a HashSet and LinkedList is just a few lines of code).
Bill K
A: 

I agree with AVI and linked list would be your best option. You can easily resize, quickly add to the end of the list, quickly remove from anywhere. Searching will not be fast, but no worse then any other unsorted list.

Jim C
A: 

look http://code.google.com/p/google-collections/

damian