views:

87

answers:

4

I have file something like this

1111,K1
2222,L2
3333,LT50
4444,K2
1111,LT50
5555,IA
6666,NA
1111,NA
2222,LT10

Output that is need

1111,K1,LT50,NA
2222,L2,LT10
3333,LT50
4444,K2
5555,IA
6666,NA

1 st Column number may repeat anytime but output that i need is sort and uniq

+1  A: 

Here is an understandable try using a non-standard tool, SQLite shell. Database is in-memory.

echo    'create table tmp (a int, b text);
        .separator ,
        .import file.txt tmp
        .output out.txt
        SELECT a, group_concat(b) FROM tmp GROUP BY a ORDER BY a ASC;
        .output stdout
        .q' | sqlite
Benoit
+4  A: 
awk -F"," '{a[$1]=a[$1]FS$2}END{for(i in a) print i,a[i]}' file | sort

If you have a big file, you can try printing the items out every few lines eg 50000

BEGIN{FS=","}
{ a[$1]=a[$1]FS$2 }
NR%50000==0 {
  for(i in a) { print  a[i] }
  delete a  #delete array so it won't take up memory
}
END{
  for(i in a){ print a[i] }
}
ghostdog74
`| sort` was requested too
Unreason
ghostdog74 Thanks for your reply.. your script working fine. but i have one problem, I have file more than 20Lakhs Rows.. For loop that you have use may take much time. you have any suggestion for this.
gyrous
what is 20Lakhs rows? awk is a pretty fast text processing tool. I highly doubt it will be slow for your problem.
ghostdog74
more than 20Lakhs rows in a file..
gyrous
like i said, what is 20Lakhs? what does Lakhs mean??? put those in numbers.... 20000 lines?? 2000000 lines??
ghostdog74
2000000. sorry that is 2 million..
gyrous
so how long goes it take to run the script on your system? 10mins?? 30 mins??
ghostdog74
Still running more than 1:30Hours..
gyrous
well, another approach since you have big file, is to break your file up and perform the operation separately. Then at the end of it, combine the results.
ghostdog74
A: 

This is solution in python. Script reads data from stdin.

#!/usr/bin/env python
import sys
d = {}
for line in sys.stdin.readlines():
  pair = line.strip().split(',')
  d[pair[0]] = d.get(pair[0], [])
  d[pair[0]].append(str(pair[1]))
for key in sorted(d):
  print "%s,%s" % (key, ','.join(d[key]))
Paweł
A: 

Here's one in Perl, but it isn't going to be particularly efficient:

#!/usr/bin/perl -w
use strict;
my %lines;
while (<>) {
    chomp;
    my ($key, $value) = split /,/;
    $lines{$key} .= "," if $lines{$key};
    $lines{$key} .= $value;
}

my $key;
for $key in (keys(%lines)) {
    print "$key,$lines{$key}\n";
}

Use like this:

$ ./command <file >newfile

You will likely have better luck with a multiple-pass solution, though. I don't really have time to write that for you. Here's an outline:

  1. Grab and remove the first line from the file.
  2. Parse through the rest of the file, concatenating any matching line and removing it.
  3. At the end of the file, output your new long line.
  4. If the file still has content, loop back to 1.
Jonathan