tags:

views:

500

answers:

4

Okay at first I thought this would be pretty straightforward. But I can't think of an efficient way to solve this. I figured a brute force way to solve this but that's not very elegant. I have an ArrayList. Contacts is a VO class that has multiple members - name, regions, id. There are duplicates in ArrayList because different regions appear multiple times. The list is sorted by ID. Here is an example:

Entry 0 - Name: John Smith; Region: N; ID: 1
Entry 1 - Name: John Smith; Region: MW; ID: 1
Entry 2 - Name: John Smith; Region: S; ID: 1
Entry 3 - Name: Jane Doe; Region: NULL; ID: 2
Entry 4 - Name: Jack Black; Region: N; ID: 3
Entry 6 - Name: Jack Black; Region: MW; ID: 3
Entry 7 - Name: Joe Don; Region: NE; ID: 4

I want to transform the list to below by combining duplicate regions together for the same ID. Therefore, the final list should have only 4 distinct elements with the regions combined.

So the output should look like this:-

Entry 0 - Name: John Smith; Region: N,MW,S; ID: 1
Entry 1 - Name: Jane Doe; Region: NULL; ID: 2
Entry 2 - Name: Jack Black; Region: N,MW; ID: 3
Entry 3 - Name: Joe Don; Region: NE; ID: 4

What are your thoughts on the optimal way to solve this? I am not looking for actual code but ideas or tips to go about the best way to get it done.

Thanks for your time!!!

+1  A: 

This is a pseudocode to accomplish what you want. At the abstract level, you have a list of Pair<K,V> (first, second), sorted by K, and no two pair are truly equal (i.e. you can have (k1,v1) and (k1,v2), but you can't have two (k1,v1) in the list.

You want to merge consecutive pairs (k,v1),(k,v2),(k,v3) to one group (k,[v1,v2,v3]).

List<Pair<K,V>> in;
List<Pair<K,List<V>>> out = [ ];

Pair<K,V> lastP = SENTINEL_PAIR; // lastP.first matches nothing
Pair<K,List<V>> lastGroup;

for (Pair<K,V> p : in) {
  if (p.first == lastP.first) {  // same group as last
    lastGroup.second.add(p.second);
  } else {                       // start a new group
    lastGroup = (p.first, [ p.second ]);
    out.add(lastGroup);
  }
  lastP = p;
}

In your case, K is the ID, and V is the region. This is O(N).

polygenelubricants
You can use a jakarta commons multimap to do this more elegantly.
Rahul
Thanks for your answer. Neat.
CoolBeans
+2  A: 

You can iterate them while dumping them (and merging duplicates) into a TreeMap. Then create a list from the sorted view of the TreeMap's values.

In the sample code I'm assuming you have an Entry class with id, name and regions fields this last one being a List of Region instances. This could easily be changed to a Set, and Region to Strings or whatever you're using. The sample copies the entries before inserting them into the map since they will be modified when merged to other entries.

SortedMap<Integer, Entry> mergedEntriesMap = new TreeMap<Integer, Entry>();
for (Entry e : entries) {
  if (mergedEntriesMap.contains(e.id)) {
    Entry m = mergedEntriesMap.get(e);
    m.regions.addAll(e.regions);
  } else {
    Entry m = new Entry();
    // copy the entry to keep the original array clean
    m.id = e.id;
    m.name = e.name;
    m.regions = new ArrayList<Region>(e.regions);
    mergedEntriesMap.put(m.id, m);
  }
}

List<Entry> mergedEntries = new ArrayList<Entry>(mergedEntriesMap.values());
Santi P.
`TreeMap` answers `containsKey` in `O(log N)`. This solution is `O(N log N)`, and therefore is not optimal.
polygenelubricants
optimal is a pretty nebulous concept. The OP could just use a HashMap, but if this is a really big data set, the code above is a pretty good solution. An optimization would be to not use the contains() call at all - just call get() and construct new if get() returns null. The use of SortedMap here doesn't really help, though - any map implementation would work.
Kevin Day
He wanted to have the output sorted, if the input is sorted as well you could solve it in O(N) by iterating it and only expecting merges to occur in consecutive entries. I figured he was already dealing with an O(N log N) when presorting the input list or sorting the output list so my solution tries to solve both the merging and the sorting at once.
Santi P.
This is a very good solution. I used HashMap instead of a TreeMap and a normal Map as opposed to a SortedMap. It worked out pretty well. Thank you very much!
CoolBeans
+1  A: 

Is the initial data stuck in this format? If not, you might want to look into changing the query you're using to retrieve your data by grouping all the ids together and forming a comma separated list column. Here's an example in sql

SELECT      Id, [Name], Regions = replace
            ((SELECT Region AS [data()]
            FROM RegionTable
            WHERE  Id = u.Id
            ORDER BY Region FOR xml path('')), ' ', ', ')
FROM        [User] u
WHERE       Id IS NOT NULL
GROUP BY Id, [Name]
Ben
Aha, I didn't know you could combine multiple rows data in a single row this way using sql. No the data is not stuck in this format. I can modify the sql. It is going against DB2. I am familiar with the REPLACE function, however, I am not sure if I can do FOR after ORDER BY this way in DB2 or not. The data is not in XML format, just plain text data. Thanks!
CoolBeans
A: 

Have you taken a look at google's Multimap? It is pretty much created for this type of data structure in which there is a key that maps to a Collection of items. So in this case a String name would map to a Collection of Region objects.

Multimap<String, Region> names = HashMultimap.create();
for (Entry entry : entries) {
    names.put(entry.getName(), entry.getRegion());
}
// Now u can get the collection of regions by name
Collection<Region> johnsRegions = names.get("John Smith");
Yanamon
Looks like Jakarta provides similar one too. Thanks for the tip.
CoolBeans