views:

71

answers:

6

Hello everyone,

I am reading in a csv file in Java and, depending on the format of the string on a given line, I have to do something different with it. The three different formats contained in the csv file are (using random numbers):

833 "79, 869" "56-57, 568"

If it is just a single number (833), I want to add it to my ArrayList. If it is two numbers separated by a comma and surrounded by quotations ("79, 869)", I want to parse out the first of the two numbers (79) and add it to the ArrayList. If it is three numbers surrounded by quotations (where the first two numbers are separated by a dash, and the third by a comma ["56-57, 568"], then I want to parse out the third number (568) and add it to the ArrayList.

I am having trouble using str.contains() to determine if the string on a given line contains a dash or not. Can anyone offer me some help? Here is what I have so far:

private static void getFile(String filePath) throws java.io.IOException {
    BufferedReader reader = new BufferedReader(new FileReader(filePath));
    String str;

    while ((str = reader.readLine()) != null) {

        if(str.endsWith("\"")){
            if (str.contains(charDash)){
                System.out.println(str);
            }
        }

    }

}

Thanks!

A: 

will

    if (str.indexOf(charDash.toString()) > -1){
        System.out.println(str);
    }

do the trick?

which by the way is fastest than contains... because it implements indexOf

Garis Suero
A: 

Will this work?

if(str.contains("-")) {
    System.out.println(str);
} 

I wonder if the charDash variable is not what you are expecting it to be.

martyhu
+1  A: 

I recommend using the version of indexOf that actually takes a char rather than a string, since this method is much faster. (It is a simple loop, without a nested loop.)

I.e.

  if (str.indexOf('-')!=-1) {
      System.out.println(str);
   }

(Note the single quotes, so this is a char, rather than a string.)

But then you have to split the line and parse the individual values. At present, you are testing if the whole line ends with a quote, which is probably not what you want.

mdma
A: 

I think three regexes would be your best bet - because with a match, you also get the bit you're interested in. I suck at regex, but something along the lines of:

.*\-.*, (.+)

.*, (.+)

and

(.+)

ought to do the trick (in order, because the final pattern matches anything including the first two).

Carl Manaster
+1  A: 

The following code works for me (note: I wrote it with no optimization in mind - it's just for testing purposes):

public static void main(String args[]) {
    ArrayList<String> numbers = GetNumbers();
}

private static ArrayList<String> GetNumbers() {
    String str1 = "833";
    String str2 = "79, 869";
    String str3 = "56-57, 568";

    ArrayList<String> lines = new ArrayList<String>();

    lines.add(str1);
    lines.add(str2);
    lines.add(str3);

    ArrayList<String> numbers = new ArrayList<String>();

    for (Iterator<String> s = lines.iterator(); s.hasNext();) {
        String thisString = s.next();

        if (thisString.contains("-")) {
            numbers.add(thisString.substring(thisString.indexOf(",") + 2));
        } else if (thisString.contains(",")) {
            numbers.add(thisString.substring(0, thisString.indexOf(",")));
        } else {
            numbers.add(thisString);
        }
    }

    return numbers;
}

Output:

833
 79
568
Leniel Macaferi
+1  A: 

Although it gets a lot of hate these days, I still really like the StringTokenizer for this kind of stuff. You can set it up to return the tokens and, at least to me, it makes the processing trivial without interacting with regexes

you'd have to create it using ",- as your tokens, then just kick it off in a loop.

st=new StringTokenizer(line, "\",-", true);

Then you set up a loop:

while(st.hasNextToken()) {
    String token=st.nextToken();

Each case becomes it's own little part of the loop:

// Use punctuation to set flags that tell you how to interpret the numbers.
if(token == "\"") {
    isQuoted = !isQuoted;
} else if(token == ",") {
    ...        
} else  if(...) {
    ...
} else { // The punctuation has been dealt with, must be a number group
    // Apply flags to determine how to parse this number.
}

I realize that StringTokenizer is outdated now, but I'm not really sure why. Parsing regular expressions can't be faster and the syntax is--well split is a pretty sweet syntax I gotta admit.

I guess if you and everyone you work with is really comfortable with Regular Expressions you could replace that with split and just iterate over the resultant array but I'm not sure how to get split to return the punctuation--probably that "+" thing from other answers but I never trust that some character I'm passing to a regular expression won't do something utterly unexpected.

Bill K