tags:

views:

1254

answers:

3

How can I parse my CSV file without parsing first line ?

This class work but I don't want to parse the header of my CSV.

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

      def parse = new File(filepath)

      // split and populate GeneInfo
      parse.splitEachLine(',') {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }

    }
}
+1  A: 

You can read each line of the file except the first into a List using:

List<String> allLinesExceptHeader = new File(filepath).readLines()[1..-1]

Each line of the file (an element of allLinesExceptHeader) can then be parsed using code similar to that shown above

allLinesExceptHeader.each {line ->    
    // Code to parse each line goes here
}
Don
using remove(0) on the list of lines might be more efficient than a range on a large file?
leebutts
+1  A: 

I change my Class, so now I have :

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

    def parse = new File(filepath).readLines()[1..-1]

    parse.each {line ->

      // split and populate GeneInfo
      line.splitEachLine(',') {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }
     }
    }
}

Works fine, until this part in my CSV :
"Homo sapiens interleukin 4 receptor (IL4R), transcript variant 1, mRNA."

When my parser get this part, he cut in 3 (should be in 1) :
- Homo sapiens interleukin 4 receptor (IL4R)
- transcript variant 1
- mRNA.

How can I fix that ? Thank you for your help.

-- New comment -- Here is a copy (2nd line) of my CSV line :
"M6.6",NA,"ILMN_1652185",NA,NA,"IL4RA; CD124",NA,"NM_000418.2","16","16p12.1a","Homo sapiens interleukin 4 receptor (IL4R), transcript variant 1, mRNA.",3566,...

As you can see my problem is in line "Homo sapiens interleukin 4 receptor (IL4R), transcript variant 1, mRNA." ; I don't want to cut text between " and ". My parser should only split ',' out of quotes (but not commas between quotes). For example I have : "part1","part2","part3", I just want cut part1, part2, part3, and if there are commas in my part2, I don't want to cut these commas.

To sum up, I just want Ignoring commas in quoted elements.

Fabien Barbier
I don't understand....if the parser is supposed to split a string into fields at each comma then why shouldn't it split this string (which contains two commas) into 3 parts?
Don
I had explanation to my problem at the end of my post.
Fabien Barbier
A: 

Ok, I have my Fix !

Here the code :

import groovy.sql.Sql

class CSVParserService {

    boolean transactional = false

    def sql = Sql.newInstance("jdbc:mysql://localhost/RProject", "xxx", "xxx", "com.mysql.jdbc.Driver")

    def CSVList = sql.dataSet("ModuleSet")

    def CSVParser(String filepath, boolean header) {

    def parse = new File(filepath).readLines()[1..-1]

    def token = ',(?=([^\"]*\"[^\"]*\")*[^\"]*$)'

    parse.each {line ->

      // split and populate GeneInfo
      line.splitEachLine(token) {fields ->

        CSVList.add(
                Module : fields[0],
                Function : fields[1],
                Systematic_Name : fields[2],
                Common_Name : fields[3],
              )

         return CSVList
      }
     }
    }
}

See this post for more details : http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes

Fabien Barbier
Did you consider using a CSV parser that does all that for you ? Like Ostermiller's ? [http://ostermiller.org/utils/CSV.html][1] [1]: http://ostermiller.org/utils/CSV.html
Philippe
Finally, I choose this CSV parser :http://opencsv.sourceforge.net/.Thanks.
Fabien Barbier