views:

55

answers:

3

Hi All, I am working with CSV in java. I am having one problem or you can say I don't know how to do it :)

I have a CSV file that is as follow:

a,4,5,3,2
b,6,4,6,7
c,5,3,7,2
2d,1,4,5,9
4e,4,2,5,7
m4,7,5,3,6
.
.
.
xyz,1,6,4,8

I want to get all the rows from CSV which contains these following labels in first column, I have all following first column labels in ArrayList

a
c
2d
m4
xyz

The result should be :

a,4,5,3,2
c,5,3,7,2
2d,1,4,5,9
m4,7,5,3,6
xyz,1,6,4,8

Thanks a alot!

P.S: My CSV contains thousands rows and column.

A: 

As the CSV is an unstructured format, and the lines are not ordered, the only way to solve the problem is to read every line of the code and decide whether to keep it.

First, you should read a line from the file (use InputStream/BufferedReader, etc.), then use StringTokenizer to split at the commas, and you can look for the first part whether you should select it.

Zoltán Ujhelyi
Thanks for your answer. Isn't it slow solution? I have BIG CSV files. It will kill my processing speed. Because for each value i have to read whole CSV match and find rows then for 2nd value .... so and so forth..
You cannot avoid reading the entire content for filtering, unless you know something specific about the data (e.g. it is ordered).You don't have to read the CSV several times: as you can check all possible first values, as showed in the code from krmby.
Zoltán Ujhelyi
A: 

This what you need.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    final private static List<String> lines = Arrays.asList(
            "a,4,5,3,2",
            "b,6,4,6,7",
            "c,5,3,7,2",
            "2d,1,4,5,9",
            "4e,4,2,5,7",
            "m4,7,5,3,",
            "xyz,1,6,4,8");
    final private static List<String> labelsInFirstColumn = Arrays.asList(
            "a",
            "c",
            "2d",
            "m4",
            "xyz");

    public static void main(String[] args) {

        List<String[]> result = new ArrayList<String[]>();

        for (String line : lines) {

            String columns[] = line.split("[,]");

            if (labelsInFirstColumn.contains(columns[0])) {

                result.add(columns);

            }

        }

        for (String[] selectedLine : result) {
            for (String column : selectedLine) {
                System.out.print(column + " | ");
            }
            System.out.println();
        }
    }
}

This is the output:

run:
a | 4 | 5 | 3 | 2 | 
c | 5 | 3 | 7 | 2 | 
2d | 1 | 4 | 5 | 9 | 
m4 | 7 | 5 | 3 | 
xyz | 1 | 6 | 4 | 8 | 
BUILD SUCCESSFUL (total time: 1 second)
krmby
Thanks alot ! ! !it works great
A: 

Read each line in the stream, figure if it is a line of interest, output to next step.

It looks like you only need to get the leading chars prior to the first comma as your comparison value, so a full parsing may not be needed, just a substring selected from start to the first indexof with a comma char.

You then use that selected string as a parameter in a find function holding your collection of desired values. If it finds it, send it as output to what ever process you need after that.

if its more than just the first position, you will need to tokenize the string, as noted elsewhere.

Rawheiser