Remove Duplicates from List of HashMap Entries

views:

448

answers:

+2 Q:

Remove Duplicates from List of HashMap Entries

I have a List<HashMap<String,Object>> which represents a database where each list record is a database row.

I have 10 columns in my database. There are several rows where the values of 2 particular columns are equals. I need to remove the duplicates from the list after the list is updated with all the rows from database.

What is the efficient way?

FYI - I am not able to do distinct while querying the database, because the GroupName is added at a later stage to the Map after the database is loaded. And since Id column is not primary key, once you add GroupName to the Map. You will have duplicates based on Id + GroupName combination!

Hope my question makes sense. Let me know if we need more clarification.

+2 A:

create a Comparator that compares HashMaps, and compares them by comparing the key/value pairs you are interested in.
use Collections.sort(yourlist, yourcomparator);
Now all maps that are similar to each other, based on your comparator, are adjacent in the list.
Create a new list.
Iterate through your first list, keeping track of what you saw last. If the current value is different than the last, add this to your new list.
You new list should contain no duplicates according to your comparator.

The cost of iterating through the list is O(n). Sorting is O(n log n). So this algorithm is O(n log n).

We could also sort on-the-fly by using a TreeSet with that comparator. Inserts are O(log n). And we have to do this n times. So we get O(n log n).

z5h 2010-02-03 21:29:16

I hope this solution is also efficient even if my list has over 1~ Million rows !

HonorGod 2010-02-03 21:46:09

It may be worth noting that with most Collections, HashMap included, you can simply remove() the duplicate object. With HashMap, you pass the key to remove(). So you won't need a duplicate List or Map.

jonescb 2010-02-03 21:51:02

What are those 1 million rows doing in Java's memory? Why are you practically duplicating the DB in Java's memory? I think the problem needs to be solved somewhere else. Just update straight in DB instead of in Java's memory and make use of constraints to prevent duplicates.

BalusC 2010-02-03 22:10:00

I am new to Comparators in Java. Can someone guide me on the following - create a Comparator that compares HashMaps, and compares them by comparing the key/value pairs that are of interest.

Gauranga 2010-03-31 17:59:32

This is not the right place to post a question, even if it is related to the original question. Use the "Ask Question" button to create a new question. Link it to this discussion if you think that will help clarify what you are asking.

z5h 2010-03-31 18:35:08

ansaurus

tags:

views:

answers:

Remove Duplicates from List of HashMap Entries

related questions