views:

130

answers:

4

Hi,

I'm using the The excellent UNIX 'comm' command line utility in an application which I developed on a BSD platform (OSX). When I deployed to my Linux production server I found out that sadly, Ubuntu Linux's 'comm' utility does not take the -i flag to indicate that the lines should be compared case-insensitive. Apparently the POSIX standard does not require the -i option.

So... I'm in a bind. I really need the -i option that works so well on BSD. I've gone so far to try to compile the BSD comm.c source code on the Linux box but I got:

http://svn.freebsd.org/viewvc/base/user/luigi/ipfw3-head/usr.bin/comm/comm.c?view=markup&pathrev=200559

me@host:~$ gcc comm.c 
comm.c: In function ‘getline’:
comm.c:195: warning: assignment makes pointer from integer without a cast
comm.c: In function ‘wcsicoll’:
comm.c:264: warning: assignment makes pointer from integer without a cast
comm.c:270: warning: assignment makes pointer from integer without a cast
/tmp/ccrvPbfz.o: In function `getline':
comm.c:(.text+0x421): undefined reference to `reallocf'
/tmp/ccrvPbfz.o: In function `wcsicoll':
comm.c:(.text+0x691): undefined reference to `reallocf'
comm.c:(.text+0x6ef): undefined reference to `reallocf'
collect2: ld returned 1 exit status

Does anyone have any suggestions as to how to get a version of comm on Linux that supports 'comm -i'?

Thanks!

A: 

You can add the following in comm.c:

void *reallocf(void *ptr, size_t size)
{
    void *ret = realloc(ptr, size);
    if (ret == NULL) {
        free(ptr);
    }
    return ret;
}

You should be able to compile it then. Make sure comm.c has #include <stdlib.h> in it (it probably does that already).

The reason your compilation fails is because BSD comm.c uses reallocf() which is not a standard C function. But it is easy to write.

Alok
Worked!! Thanks very much.
Sam Magister
Unfortunately it does not handle latin-1 encoded files, though utf-8 encoded files work fine. Any ideas on how to handle that? comm: /path/to/file: Invalid or incomplete multibyte or wide character
Sam Magister
@Sam: what is yor `LANG` environment variable? `echo $LANG`. You should play around with it: set it to `C`, or `en_US.UTF-8`, etc.
Alok
A: 
Does anyone have any suggestions as to how to get a version of comm on Linux that supports 'comm -i'?

Not quite that; but have you checked if your requirements could be satisfied by the join utility? This one does have the -i option on Linux...

DevSolar
Interesting utility but I need comm. Thanks though!
Sam Magister
A: 

@OP ,there's no need to go to such length as to do your own src code compilation . Here's an alternative suggestion. Since you want case insensitive, you can just convert the cases in both files to lower (or upper case) using another tool such as tr before you pass the files to comm.

tr '[A-Z]' '[a-z]' <file1 > temp1
tr '[A-Z]' '[a-z]' <file2 > temp2
comm temp1 temp2
ghostdog74
Yup, I considered that, but I actually need to have the case folded during comparison but then retain the original case of the characters in the result. Thanks for the suggestion!
Sam Magister
A: 

You could try to cat both files and pipe them to uniq -c -i. It'll show all lines in both files, with the number of appearances in the first column. As long as the original files don't have repeated lines, all lines with the first column >1 are lines common to both files.

Hope it helps!

Jesus