views:

420

answers:

4

I have a .csv file that is ordered in a certain way. I want to reorder it by another field. Your ideas would be much appreciated.

I only have to do this once, not multiple times, so performance is not too much of an issue.

What I am thinking. If I just create an object (java) to hold each of the fields and then create an ArrayList of these objects. I will then order the ArrayList on the field I want (I can order an ArrayList of objects based on one member of the object - right?), and print this reordered ArrayList to a .csv file.

+3  A: 

Can't you just load the csv into Excel, use the sort function to reorder it, and then save the result as a new csv file?

haha, yeah that would work too.
stimms
Or OpenOffice if you don't have Excel.
Mark Rushakoff
It'd have to be 2007. Earlier versions have a 65536 row limit.
David
No it's much too large to be opened in excel or OpenOffice. Hence all the hassle.
Ankur
Yes, that's true, but we don't know how many records are in the CSV file. I was assuming that 60MB refers to the size and not the number of records.
OK then Access will do the same if you got it. Otherwise, a more exotic technique would be required - like stimms one below.
Then there's the tendency of Excel to change data formats such as dates and number of decimal places, so when you've reordered and saved again, information is lost.
pavium
+9  A: 

Souds like it would work but is also some overkill. If you have a unix box or cygwin you could just do

cat file | sort -t , +<field number>

This will break the fields up by , and sort by the field number

cat file | sort -t , +2

sorts by the second field.

stimms
That might be the way to go
Ankur
+1 for the simplest answer - with no coding required.
Matt H
It's already taken more than one hour - need to consider something else
Ankur
+1  A: 

If you know how to use Vim: http://vim.wikia.com/wiki/Working_with_CSV_files

CSV files (comma-separated values) are often used to save tables of data in plain text. Following are some useful techniques for working with CSV files. You can:

  • Highlight all text in any column.
  • View fields (convert csv text to columns or separate lines).
  • Navigate using the HJKL keys to go left, down, up, right by cell (hjkl work as normal).
  • Search for text in a specific column.
  • Sort lines by column.
  • Delete a column.
  • Specify a delimiter other than comma.
Leftium
+2  A: 

If you have access to a Linux box then use sort as suggested above. But, if it has to be Java then at least use an existing library to parse the CSV file. The format is hellishly complicated to parse if you want to handle all the corner cases correctly. I'd suggest a library like OpenCSV.

This code snippet show how to use the library (with all error handling omitted!)

/**
 * Sorts a CSV file by a fixed column.
 *
 * @param col The zero-based column to sort by.
 * @param in The input CSV file.
 * @param out The output writer to receive the reordered CSV.
 */
public static void sort(final int col, final Reader in, final Writer out)
        throws IOException {
    final List<String[]> csvContent = new ArrayList<String[]>();

    // parse CSV file
    final CSVReader reader = new CSVReader(in);
    String[] line;
    while ((line = reader.readNext()) != null) {
        csvContent.add(line);
    }
    reader.close();

    // sort CSV content
    Collections.sort(csvContent, new Comparator<String[]>() {
        @Override
        public int compare(final String[] o1, final String[] o2) {
            // adjust here for numeric sort, etc.
            return o1[col].compareTo(o2[col]);
        }
    });

    // write sorted content
    final CSVWriter writer = new CSVWriter(out);
    writer.writeAll(csvContent);
    writer.close();
}

You can adjust the code to handle different separator characters, quote chars, numeric sorting, etc.

janko