tags:

views:

129

answers:

2

I am using scala 2.7.7, and wanted to parse CSV file and store the data in SQLite database.

I ended up using OpenCSV java library to parse the CSV file, and using sqlitejdbc library.

Using these java libraries makes my scala code looks almost identical to that of Java code (sans semicolon and with val/var)

As I am dealing with java objects, I can't use scala list, map, etc, unless I do scala2java conversion or upgrade to scala 2.8

Is there a way I can simplify my code further using scala bits that I don't know?

val filename = "file.csv";
val reader = new CSVReader(new FileReader(filename))
var aLine = new Array[String](10)
var lastSymbol = ""
while( (aLine = reader.readNext()) != null ) {
    if( aLine != null ) {
        val symbol = aLine(0)
        if( !symbol.equals(lastSymbol)) { 
            try {
                val rs = stat.executeQuery("select name from sqlite_master where name='" + symbol + "';" )
                if( !rs.next() ) {
                    stat.executeUpdate("drop table if exists '" + symbol + "';")
                    stat.executeUpdate("create table '" + symbol + "' (symbol,data,open,high,low,close,vol);")
                }
            }
            catch {
              case sqle : java.sql.SQLException =>
                 println(sqle)

            }
            lastSymbol = symbol
        }
        val prep = conn.prepareStatement("insert into '" + symbol + "' values (?,?,?,?,?,?,?);")
        prep.setString(1, aLine(0)) //symbol
        prep.setString(2, aLine(1)) //date
        prep.setString(3, aLine(2)) //open
        prep.setString(4, aLine(3)) //high
        prep.setString(5, aLine(4)) //low
        prep.setString(6, aLine(5)) //close
        prep.setString(7, aLine(6)) //vol
        prep.addBatch()
        prep.executeBatch()
     }
}
conn.close()
+4  A: 

If you have a simple CSV file, an alternative would be not to use any CSV library at all, but just simply parse it in Scala, for example:


case class Stock(line: String) {
  val data = line.split(",")
  val date = data(0)
  val open = data(1).toDouble
  val high = data(2).toDouble
  val low = data(3).toDouble
  val close = data(4).toDouble
  val volume = data(5).toDouble
  val adjClose = data(6).toDouble

  def price: Double = low
}

scala> import scala.io._

scala> Source.fromPath("stock.csv") getLines() map (l => Stock(l))
res0: Iterator[Stock] = non-empty iterator


scala> res0.toSeq  
res1: Seq[Stock] = List(Stock(2010-03-15,37.90,38.04,37.42,37.64,941500,37.64), Stock(2010-03-12,38.00,38.08,37.66,37.89,834800,37.89) //etc...

Which would have the advantage that you can use the full Scala collection API.

If you prefer to use parser combinators, there's also an example of a csv parser combinator on github.

Arjan Blokzijl
also you might want to look into splitter http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/base/Splitter.html in guava http://code.google.com/p/guava-libraries/ for a better split function. string.split has some gotchas.
King Cub
+2  A: 

The if statement after the while is useless--you've already made sure that aLine is not null.

Also, I don't know exactly what the contents of aLine is, but you probably want to do something like

aLine.zipWithIndex.foreach(i => prep.setString(i._2+1 , i._1))

instead of counting up by hand from 1 to 7. Or alternatively, you can

for (i <- 1 to 7) { prep.setString(i, aLine(i)) }

If you felt adopting a more functional style, you could probably replace the while with

Iterator.continually(reader.readNext()).takeWhile(_!=null).foreach(aLine => {
  // Body of while goes here
}

(and also remove the var aLine). But using the while is fine. One could also refactor to avoid the lastSymbol (e.g. by using a recursive def), but I'm not really sure that's worth it.

Rex Kerr
should have been `for (i <- 0 to 6) { prep.setString(i + 1, aLine(i)) }`
Jesper