views:

519

answers:

7

I have a C file which I copied from somewhere else, but it has a lot of comments like below:

int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)

How can I delete all the comments enclosed by /* and */. Sometimes, the comments are consist of 4-5 lines, and i need to delete all those lines.

Basically, I need to delete all text between /* and */ and even \n can come in between. Please help me do this using one of sed, awk or perl.

+11  A: 

See perlfaq6. It's quite a complex scenario.

$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
print;

A word of warning - once you've done this, do you have a test scenario to prove to yourself that you've just removed the comments and nothing valuable ? If you're running such a powerful regexp I'd ensure some sort of test (even if you simply record the behaviour before/afterwards).

Brian Agnew
Just check that the binaries created by compiling are identical (modulo timestamps or other build identification).
ephemient
That may well be the simplest solution
Brian Agnew
Agreed, I would never do this on code I cared about unless I had unit tests in place to verify its correctness after filtering it.
Ether
A: 

very simplistic example using gawk. Please test a lot of times before implementing. Of course it doesn't take care of the other comment style // (in C++??)

$ more file
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
/*
function(){
 blah blah
}
*/
float a;
float b;

$ awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' file
int matrix[20];


for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;


for (index = 0; index < 5 ;index++)


float a;
float b;
ghostdog74
for some reason this is not working on my machine:( `cat testint matrix[20];/* generate data */for (index = 0 ;index < 20; index++)matrix[index] = index + 1;/* print original data */` and the output is `awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' testint matrix[20];/ generate data/for (index = 0 ;index < 20; index++)matrix[index] = index + 1;/ print original data/`
Vijay Sarathi
i already indicated, using gawk. do you have gawk?
ghostdog74
sorry, the comment is so messed up, i didn't notice you have output. Well, it worked for me. I see you still have /generate data/ and /print original data/. As you can see from my output, it works for me.
ghostdog74
if you still can't get it to work, there's the perl solution below you can try
ghostdog74
+2  A: 

Try this on the command line (replacing 'file-names' with the list of file that need to be processed):

perl -i -wpe 'BEGIN{undef $/} s!/\*.*?\*/!!sg' file-names

This program changes the files in-place (overwriting the original file with the corrected output). If you just want the output without changing the original files, omit the '-i' switch.

Explanation:

perl -- call the perl interpreter
-i      switch to 'change-in-place' mode.
-w      print warnings to STDOUT (if there are any)
 p      read the files and print $_ for each record; like while(<>){ ...; print $_;}
 e      process the following argument as a program (once for each input record)

BEGIN{undef $/} --- process whole files instead of individual lines.
s!      search and replace ...
  /\*     the starting /* marker
  .*?     followed by any text (not gredy search)
  \*/     followed by the */ marker
!!      replace by the empty string (i.e. remove comments)  
  s     treat newline characters \n like normal characters (remove multi-line comments)
   g    repeat as necessary to process all comments.

file-names   list of files to be processed.
Yaakov Belch
See the perlfaq to understand why this is so very wrong.
brian d foy
@brian Accepted: This is only an approximate solution.
Yaakov Belch
+6  A: 

Take a look at the strip_comments routine in Inline::Filters:

sub strip_comments {
    my ($txt, $opn, $cls, @quotes) = @_;
    my $i = -1;
    while (++$i < length $txt) {
    my $closer;
        if (grep {my $r=substr($txt,$i,length($_)) eq $_; $closer=$_ if $r; $r}
        @quotes) {
        $i = skip_quoted($txt, $i, $closer);
        next;
        }
        if (substr($txt, $i, length($opn)) eq $opn) {
        my $e = index($txt, $cls, $i) + length($cls);
        substr($txt, $i, $e-$i) =~ s/[^\n]/ /g;
        $i--;
        next;
        }
    }
    return $txt;
}
Sinan Ünür
+4  A: 

Consider:

printf("... /* ...");
int matrix[20];
printf("... */ ...");

In other words: I wouldn't use regex for this task, unless you're doing a replace-once and are positive that the above does not occur.

Bart Kiers
+21  A: 

Why not just use the c preprocessor to do this? Why are you confining yourself to a home-grown regex?

[Edit] This approach also handles Barts printf(".../*...") scenario cleanly

Example:

[File: t.c]
/* This is a comment */
int main () {
    /* 
     * This
     * is 
     * a
     * multiline
     * comment
     */
    int f = 42;
    /*
     * More comments
     */
    return 0;
}

.

$ cpp -P t.c
int main () {







    int f = 42;



    return 0;
}

Or you can remove the whitespace and condense everything

$ cpp -P t.c | egrep -v "^[ \t]*$"
int main () {
    int f = 42;
    return 0;
}

No use re-inventing the wheel, is there?

[Edit] If you want to not expand included files and macroa by this approach, cpp provides flags for this. Consider:

[File: t.c]

#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

.

$ cpp -P -fpreprocessed t.c | grep -v "^[ \t]*$"
#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;
}

There is a slight caveat in that macro expansion can be avoided, but the original definition of the macro is stripped from the source.

ezpz
Yes, this is what I´d use!
Bart Kiers
The preprocessor has a (potentially indesirable) "side-effect" : it also processes macros, includes included files, and so on...
RaphaelSP
You can get rid of macro expansion by `-fpreprocessed`. I'll update to mention this
ezpz
-1 again. That is not a *slight* caveat if you expect the source code to compile after removing comments.
Sinan Ünür
This caveat can be fixed: perl -wpe 's/^\s*#define/#include#define/' your-file.c |cpp -P - -fpreprocessed|perl -wpe 's/#include#define/#include/ ---- this turns any #defines into (somewhat invalid) #includes that pass through the preprocessor, to be converted back to correct #defines later. (If you agree, please add this to the answer itself).
Yaakov Belch
+4  A: 

Please do not use cpp for this unless you understand the ramifications:

$ cat t.c
#include <stdio.h>

#define MSG "Hello World"

int main(void) {
    /* ANNOY: print MSG using the puts function */
    puts(MSG);
    return 0;
}

Now, let's run it through cpp:

$ cpp -P t.c -fpreprocessed


#include <stdio.h>



int main(void) {


    puts(MSG);
    return 0;
}

Clearly, this file is no longer going to compile.

Sinan Ünür
well, not after you add the `-fpreprocessed` flag, anyway
Hasturkun
@Hasturkun and if you don't add -fpreprocessed, `#include <stdio.h>` will be expanded.
Sinan Ünür
I tried this: perl -wpe 's/^\s*#define/#include#define/' your-file.c |cpp -P - -fpreprocessed|perl -wpe 's/#include#define/#include/ ---- this turns any #defines into (somewhat invalid) #includes that pass through the preprocessor, to be converted back to correct #defines later.
Yaakov Belch