tags:

views:

137

answers:

4

Hi there, I am in need of reorganizing a large CSV file. The first column, which is currently a 6 digit number needs to be split up, using commas as the field separator.

For example, I need this:

022250,10:50 AM,274,22,50
022255,11:55 AM,275,22,55

turned into this:

0,2,2,2,5,0,10:50 AM,274,22,50
0,2,2,2,5,5,11:55 AM,275,22,55

Let me know what you think!

Thanks!

+2  A: 

I think this might work. The split function (at least in the version I am running) splits the value into individual characters if the third parameter is an empty string.

  BEGIN{ FS="," }
  {
     n = split( $1, a, "" );
     for ( i = 1; i <= n; i++ )
        printf("%s,", a[i] );

     sep = "";
     for ( i = 2; i <= NF; i++ )
        {
        printf( "%s%s", sep, $i );
        sep = ",";
        }
     printf("\n");
  }
Mark Wilkins
I am not sure how to use this answer. Can you explain? Thanks!
wizkid84
@wizkid84: Put the script into a file (e.g., splitit.awk) then run it with the command: awk -f splitit.awk <inputfile>
Mark Wilkins
@Mark Wilkins: or run it as a one-liner: `awk '<<script goes here>>' <input file>` (in the same way as in my answer).
Jefromi
@Jefromi: That's true. My tendency is always to put them in files since it makes them a little easier to read. But if it is in a script, it is nicer to embed it directly in the script rather than have yet another file hanging around. Long way of saying good point! :)
Mark Wilkins
+2  A: 

It's a lot shorter in perl:

perl -F, -ane '$,=","; print split("",$F[0]), @F[1..$#F]' <file>

Since you don't know perl, a quick explanation. -F, indicates the input field separator is the comma (like awk). -a activates auto-split (into the array @F), -n implicitly wraps the code in a while (<>) { ... } loop, which reads input line-by-line. -e indicates the next argument is the script to run. $, is the output field separator (it gets set iteration of the loop this way, but oh well). split has obvious purpose, and you can see how the array is indexed/sliced. print, when lists as arguments like this, uses the output field separator and prints all their fields.

In awk:

awk -F, '{n=split($1,a,""); for (i=1;i<=n;i++) {printf("%s,",a[i])}; for (i=2;i<NF;i++) {printf("%s,",$i)}; print $NF}' <file>
Jefromi
Wow I need to learn perl. Thanks!
wizkid84
"Perl lead to hashes, hashes lead to hate, hate leads to suffering"
Jed Daniels
A: 

Here's a variation on a theme. One thing to note is it prints the remaining fields without using a loop. Another is that since you're looping over the characters in the first field anyway, why not just do it without using the null-delimiter feature of split() (which may not be present in some versions of AWK):

awk -F, 'BEGIN{OFS=","} {len=length($1); for (i=1;i<len; i++) {printf "%s,", substr($1,i,1)}; printf "%s", substr($1,len,1);$1=""; print $0}' filename

As a script:

BEGIN {FS = OFS = ","}
{
    len = length($1); 
    for (i=1; i<len; i++)
        {printf "%s,", substr($1, i, 1)}; 
    printf "%s", substr($1, len, 1)
    $1 = "";
    print $0
}
Dennis Williamson
+1  A: 

here's another way in awk

$ awk -F"," '{gsub(".",",&",$1);sub("^,","",$1)}1' OFS="," file
0,2,2,2,5,0,10:50 AM,274,22,50
0,2,2,2,5,5,11:55 AM,275,22,55
ghostdog74