views:

724

answers:

4

I have a series of arrays that i draw data from and ultimately lead to a final array that holds the information i want. The final array is 2-dimensional, being comprised of a large number of single dimensional arrays, each holding up to 3 entries.

int[][] realOcc = new int[result.length][3];

The way the array holds data is as follows: The first "cell" holds a name, the second a region ID and the third an occurence number - an int telling me how many times this name came up in this particular region ID.

After sorting the array according to name using a Bubble Sort algorithm i naturally see many entries that i wouldn't want to be there. As an example, imagine a name coming up 3 times in a specific region ID. The way the array entries for this name would look like would be as follows:

Name1 regionID17 1
Name1 regionID17 2
Name1 regionID17 3
...
Name156 regionID1 1
Name168 regionID99 1
...

What i want to do is get rid of all the excess entries as in, the entries that correspond to the same name and regID, and only keep the maximum occurence number for each name in a specific region. So, taking the above example, what i would like to see after manipulating the array would be:

Name1 regionID17 3
...
Name156 regionID1 1
Name168 regionID99 1
...

Any ideas would be greatly appreciated since i'm pretty much stumped.. Keep in mind that since the data i'm pulling is pretty big in number, i also need to keep my code efficient.

+2  A: 

What you really should look into is using an ArrayList class to hold these 'items.'

You should also create a specific class to hold this data.

Your custom class should look something like this:

class Entry implements Comparable<Entry> {
    private String name, region;
    private int occuranceCount;

    public Entry(String nameP, regionP, occurCountP){
        name = nameP;
        region = regionP;
        occuranceCount = occurCountP;
    }

    // Getters

    public int compareTo(Entry other){
        return name.compareTo(other.name);
    }

    // Equals and hashcode
}

Then you can put these objects into an ArrayList<Entry> and use Collections.sort() which will be much faster than a bubble sort.

After they are sorted, you can loop through and remove duplicate entries using ArrayList.remove().

jjnguy
Why use an ArrayList and have a second iteration when you can use a hash data structure of some kind?
dimo414
Well, perhaps he/she wants to have some sort of order to the data.
jjnguy
I know that a set could solve many of the problems, but it would also have no sense of order to the elements.
jjnguy
A TreeMap then would allow you to define a custom ordering. By the sound of the problem though, the sorting is only being done to identify duplicates and remove them.
dimo414
You are probably right. I guess I was making the assumption that he would like to keep his code as similar to what it is now as possible.
jjnguy
+2  A: 

A question comes to mind: Why are you using an array? I would think that it is better to use a Set object to hold your results, and then create a Result object that has three fields: one for name, one for region and one for the count. If the equals and hash methods are overriden to only take into account the region and the name, then you will have no duplicates on your set and you can use it to keep track of these result objects.

Another way to achieve the same is to have a Map, where the key is the name + region and the value is the count. That would also make it simple to implement and ensure that you have no duplicates.

Mario Ortegón
A: 

Sounds like a Hashtable or Map might be useful. You could make a single pass through the raw data, and use the Map to lookup the name, add it if not seen yet, or check to see if it has been exceeded in value by the new entry if it does. This wouldn't require sorting ahead of time; you could sort afterwards, and save a lot of time :-)

rtenhove
Though not technically deprecated, you don't want to use Hashtable if you can help it, HashMap is correct usage.
dimo414
+2  A: 

I agree with Mario, you shouldn't be using an array structure here. The fact that you're using Bubble Sort suggests to me that your in an intro programming course of some kind, so you may not know about ArrayLists, HashSets, the equals() method, or the like, but that's what you really want to do. Create a custom object with a custom equals method - something like:

public class Record{
  String name;
  String region;

  public boolean equals(Object o){
    Record r = (Record)o;
    return name.equals(r.name) && region.equals(r.region);
  }

  public int hashCode(){
    return name.hashCode()+region.hashCode();
  }
}

Then you can use a HashMap to check if a record already exists in the set - if it does, increment count (the map's value) by 1, else add it.

If you want everything sorted in a certain order, you can either define a custom compareTo() method and use a TreeMap or, if you want everything in insertion order, use a LinkedHashSet to preserve that order.

I hope this helps!

dimo414
Don't forget to implement hashCode or else this method will not work since the HashSet won't correctly find the already-inserted object.Also, for something like this I would use a HashMap<Record, Integer>; the Record has the name and region only (and is otherwise as in your answer) while the Integer is the count.
jprete
Thank you yes, hashCode is absolutely important.
dimo414
Thanks a lot for the answer, it pushed me to the right direction.
sp