I am connecting to a sockets API that is very inflexible. It will return rows such as:
NAME, CITY, STATE, JOB, MONTH
But will have duplicates because it does not do any aggregation. I need to count the duplicate rows (which would be very easy in SQL, but not, as far as I know, in Java).
Example source data:
NAME, CITY, STATE, JOB, MONTH
John Doe, Denver, CO, INSTALLATION, 090301
John Doe, Denver, CO, INSTALLATION, 090301
John Doe, Denver, CO, INSTALLATION, 090301
Jane Doe, Phoenix, AZ, SUPPORT, 090301
Intended:
NAME, CITY, STATE, JOB, MONTH, COUNT
John Doe, Denver, CO, INSTALLATION, 090301, 3
Jane Doe, Phoenix, AZ, SUPPORT, 090301, 1
I can easily do this for approximately 100,000 return rows, but I am dealing with about 60 million in a month. Any ideas?
Edit: Unfortunately, the rows are not returned sorted... nor is there an option through the API to sort them. I get this giant mess of stuff that needs to be aggregated. Right now I use an ArrayList and do indexOf(new row) to find if the item already exists, but it gets slower the more rows that there are.
Edit: For clarification, this would only need to be run once a month, at the end of the month. Thank you for all of the responses