ansaurus

Question

parsing issue with comma separated csv file

Answer 1

A:

You shouldn't use awk here. Use Python csv module or Perl Text::CSV or Text::CSV_XS modules or another real csv parser.

Related question - http://stackoverflow.com/questions/314384/parse-csv-file-using-gawk

Leonid Shvechikov 2010-03-18 13:05:30

Answer 2

+2 A:

Unless you have specific reasons for using awk, I would recommend using a CSV parsing library. Many scripting languages have one built-in (or at least available) and they'll save you from these headaches.

Benjamin Oakes 2010-03-18 13:06:21

Answer 3

+1 A:

if your first column has quotes always,

 $ awk 'BEGIN{ FS="\042[ ]*," } { m=split($2,a,","); print a[3] } ' file
 I_want_this_column

if the column you want is always the last 2nd,

$ awk -F"," '{print $(NF-1)}' file
 I_want_this_column

You can try this demo script to break down the columns

awk 'BEGIN{ FS="," }
{
   for(i=1;i<=NF;i++){
      # save normal
      if($i !~ /^[ ]*\042|[ ]*\042[ ]*$/){
        a[++j]=$i
      }
      # if quotes at the end
      if(f==1 && $i ~ /[ ]*\042[ ]*$/){
        s=s","$i
        a[++j]=s
        #reset
        s="";f=0
      }
      # if quotes in front
      if($i ~ /^[ ]*\042/){
        s=s $i
        f=1
      }
      if(f==1 && ( $i !~/\042/ ) ){
         s=s","$i
      }
   }
}
END{
  # print columns
  for(p=1;p<=j;p++){
     print "Field "p,": "a[p]
  }
} ' file

output

$ cat file
"sdfsdfsd, sfsdf", "454,fgdfg blah , words ", I_want_this_column,sdfgdg

$ ./shell.sh
Field 1 : "sdfsdfsd, sfsdf"
Field 2 : fgdfg blah
Field 3 :  "454,fgdfg blah , words "
Field 4 :  I_want_this_column
Field 5 : sdfgdg

ghostdog74 2010-03-18 13:27:32

It is NOT the case, as it may NOT have the comma in the first column

vehomzzz 2010-03-18 13:32:33

Answer 4

A:

If you can't avoid awk, this piece of code does the job you need:

BEGIN {FS=",";}

{
        f=0;
        j=0;
        for (i = 1; i <=NF ; ++i) {
                if (f) {
                        a[j] = a[j] "," $(i);
                        if ($(i) ~ "\"$") {
                                f = 0;
                        }
                }
                else {
                        ++j;
                        a[j] = $(i);
                        if ((a[j] ~ "^\"[^\"]*$")) {
                                f = 1;
                        }
                }
        }
        for (i = 1; i <= j; ++i) {
                gsub("^\"","",a[i]);
                gsub("\"$","",a[i]);
                gsub("\"\"","\"",a[i]);
print "i = \"" a[i] "\"";
        }
}

Giuseppe Guerrini 2010-03-18 21:22:21

it breaks when the comma has spaces after/before it. eg try on this data: `"sdfsdfsd, sfsdf" , "454,fgdfg", I_want_this_column`

ghostdog74 2010-03-19 02:18:26

The original question stated 'FS =","', so I guess spaces are not an issue.

Giuseppe Guerrini 2010-03-19 07:45:47

ansaurus

tags:

views:

answers:

parsing issue with comma separated csv file

related questions