ansaurus

Question

Answer 1

+1 A:

Here is an understandable try using a non-standard tool, SQLite shell. Database is in-memory.

echo    'create table tmp (a int, b text);
        .separator ,
        .import file.txt tmp
        .output out.txt
        SELECT a, group_concat(b) FROM tmp GROUP BY a ORDER BY a ASC;
        .output stdout
        .q' | sqlite

Benoit 2010-10-13 08:01:52

Answer 2

+4 A:

awk -F"," '{a[$1]=a[$1]FS$2}END{for(i in a) print i,a[i]}' file | sort

If you have a big file, you can try printing the items out every few lines eg 50000

BEGIN{FS=","}
{ a[$1]=a[$1]FS$2 }
NR%50000==0 {
  for(i in a) { print  a[i] }
  delete a  #delete array so it won't take up memory
}
END{
  for(i in a){ print a[i] }
}

ghostdog74 2010-10-13 08:02:37

`| sort` was requested too

Unreason 2010-10-13 08:07:32

ghostdog74 Thanks for your reply.. your script working fine. but i have one problem, I have file more than 20Lakhs Rows.. For loop that you have use may take much time. you have any suggestion for this.

gyrous 2010-10-13 09:05:46

what is 20Lakhs rows? awk is a pretty fast text processing tool. I highly doubt it will be slow for your problem.

ghostdog74 2010-10-13 09:44:23

more than 20Lakhs rows in a file..

gyrous 2010-10-13 09:50:53

like i said, what is 20Lakhs? what does Lakhs mean??? put those in numbers.... 20000 lines?? 2000000 lines??

ghostdog74 2010-10-13 09:52:49

2000000. sorry that is 2 million..

gyrous 2010-10-13 09:54:55

so how long goes it take to run the script on your system? 10mins?? 30 mins??

ghostdog74 2010-10-13 10:00:46

Still running more than 1:30Hours..

gyrous 2010-10-13 10:02:29

well, another approach since you have big file, is to break your file up and perform the operation separately. Then at the end of it, combine the results.

ghostdog74 2010-10-13 10:16:47

Answer 3

A:

This is solution in python. Script reads data from stdin.

#!/usr/bin/env python
import sys
d = {}
for line in sys.stdin.readlines():
  pair = line.strip().split(',')
  d[pair[0]] = d.get(pair[0], [])
  d[pair[0]].append(str(pair[1]))
for key in sorted(d):
  print "%s,%s" % (key, ','.join(d[key]))

Paweł 2010-10-13 08:45:21

Answer 4

A:

Here's one in Perl, but it isn't going to be particularly efficient:

#!/usr/bin/perl -w
use strict;
my %lines;
while (<>) {
    chomp;
    my ($key, $value) = split /,/;
    $lines{$key} .= "," if $lines{$key};
    $lines{$key} .= $value;
}

my $key;
for $key in (keys(%lines)) {
    print "$key,$lines{$key}\n";
}

Use like this:

$ ./command <file >newfile

You will likely have better luck with a multiple-pass solution, though. I don't really have time to write that for you. Here's an outline:

Grab and remove the first line from the file.
Parse through the rest of the file, concatenating any matching line and removing it.
At the end of the file, output your new long line.
If the file still has content, loop back to 1.

Jonathan 2010-10-15 18:39:17

ansaurus

tags:

views:

answers:

Transpose a File in unix

related questions