tags:

views:

6057

answers:

7

Hi,

I want to write a program for a school java project to parse some CSV I do not know. I do know the datatype of each column - although I do not know the delimiter.

The problem I do not even marginally know how to fix is to parse Date or even DateTime Columns. They can be in one of many formats.

I found many libraries but have no clue which is the best for my needs: http://opencsv.sourceforge.net/ http://www.csvreader.com/java_csv.php http://supercsv.sourceforge.net/ http://flatpack.sourceforge.net/

The problem is I am a total java beginner. I am afraid non of those libraries can do what I need or I can't convince them to do it.

I bet there are a lot of people here who have code sample that could get me started in no time for what I need:

  • automatically split in Columns (delimiter unknown, Columntypes are known)
  • cast to Columntype (should cope with $, %, etc.)
  • convert dates to Java Date or Calendar Objects

It would be nice to get as many code samples as possible by email.

Thanks a lot! AS

+1  A: 

At a minimum you are going to need to know the column delimiter.

Richard West
Not necessarily. If he knows the datatype of the first column, he can just consume the first line while it conforms to the datatype. Then, the first character will be the delimiter.
fishlips
OK - let's say I know the delimiter. Can you provide me with a working that shows how I can do this thing? Especially bringing Dates into Java and converting Numbers that have things lik $, %, etc. ?
Andy Schmidt
Just so I understand, are you wanting to store "$9.99" as the value 9.99 in a numeric field?
Richard West
A: 

Basically you will need to read the file line by line.

Then you will need to split each line by the delimiter, say a comma (CSV stands for comma-separated values), with

String[] strArr=line.split(",");

This will turn it into an array of strings which you can then manipulate, for example with

String name=strArr[0];
int yearOfBirth = Integer.valueOf(strArr[1]);
int monthOfBirth = Integer.valueOf(strArr[2]);
int dayOfBirth = Integer.valueOf(strArr[3]);
GregorianCalendar dob=new GregorianCalendar(yearOfBirth, monthOfBirth, dayOfBirth);
Student student=new Student(name, dob); //lets pretend you are creating instances of Student

You will need to do this for every line so wrap this code into a while loop. (If you don't know the delimiter just open the file in a text editor.)

Lenni
I was talking about more elaborate example using one of the example libraries listed above.The trivial things like string.split() I thought about myself ;)I was talking about foolproof parsing of VARIOUS Date formats and Numbers which contain $ and %.
Andy Schmidt
May I ask what the $s and %s stand for in your dates and numbers?
Lenni
splitting on commas is not safe - CSVs can have strings that contain commas. The opencsv and Apaches libraries take care of all of this parsing - best to use them.
Kevin Day
+1  A: 

You might want to have a look at this specification for CSV. Bear in mind that there is no official recognized specification.

If you do not now the delimiter it will not be possible to do this so you have to find out somehow. If you can do a manual inspection of the file you should quickly be able to see what it is and hard code it in your program. If the delimiter can vary your only hope is to be able to deduce if from the formatting of the known data. When Excel imports CSV files it lets the user choose the delimiter and this is a solution you could use as well.

willcodejavaforfood
A: 

i had to use a csv parser about 5 years ago. seems there are at least two csv standards: http://en.wikipedia.org/wiki/Comma-separated_values and what microsoft does in excel.

i found this libaray which eats both: http://ostermiller.org/utils/CSV.html, but afaik, it has no way of groking what data type the columns were.

Ray Tayek
+2  A: 

You also have the Apache Common library for CSV, maybe it does what you need.

Also, for the foolprof edition, I think you'll need to code it yourself...through Simple Date Format you can choose your formats, and specify various types, if the Date isn't like any of your pre-thought types, it isn't a Date..

Valentin Rocher
A: 

I would recommend that you start by pulling your task apart into it's component parts.

  1. Read string data from a CSV
  2. Convert string data to appropriate format

Once you do that, it should be fairly trivial to use one of the libraries you link to (which most certainly will handle task #1). Then iterate through the returned values, and cast/convert each String value to the value you want.

If the question is how to convert strings to different objects, it's going to depend on what format you are starting with, and what format you want to wind up with.

DateFormat.parse(), for example, will parse dates from strings. See SimpleDateFormat for quickly constructing a DateFormat for a certain string representation. Integer.parseInt() will prase integers from strings.

Currency, you'll have to decide how you want to capture it. If you want to just capture as a float, then Float.parseFloat() will do the trick (just use String.replace() to remove all $ and commas before you parse it). Or you can parse into a BigDecimal (so you don't have rounding problems). There may be a better class for currency handling (I don't do much of that, so am not familiar with that area of the JDK).

Kevin Day
A: 

My approach would not be to start by writing your own API. Life's too short, and there are more pressing problems to solve. In this situation, I typically:

  • Find a library that appears to do what I want. If one doesn't exist, then implement it.
  • If a library does exist, but I'm not sure it'll be suitable for my needs, write a thin adapter API around it, so I can control how it's called. The adapter API expresses the API I need, and it maps those calls to the underlying API.
  • If the library doesn't turn out to be suitable, I can swap another one in underneath the adapter API (whether it's another open source one or something I write myself) with a minimum of effort, without affecting the callers.

Start with something someone has already written. Odds are, it'll do what you want. You can always write your own later, if necessary. OpenCSV is as good a starting point as any.

Brian Clapper