



I have a C file which I copied from somewhere else, but it has a lot of comments like below:

int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)

How can I delete all the comments enclosed by /* and */. Sometimes, the comments are consist of 4-5 lines, and i need to delete all those lines.

Basically, I need to delete all text between /* and */ and even \n can come in between. Please help me do this using one of sed, awk or perl.

See perlfaq6. It's quite a complex scenario.

$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;

A word of warning - once you've done this, do you have a test scenario to prove to yourself that you've just removed the comments and nothing valuable ? If you're running such a powerful regexp I'd ensure some sort of test (even if you simply record the behaviour before/afterwards).

Just check that the binaries created by compiling are identical (modulo timestamps or other build identification).
That may well be the simplest solution
Agreed, I would never do this on code I cared about unless I had unit tests in place to verify its correctness after filtering it.

very simplistic example using gawk. Please test a lot of times before implementing. Of course it doesn't take care of the other comment style // (in C++??)

$ more file
int matrix[20];
/* generate data */
for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;
/* print original data */
for (index = 0; index < 5 ;index++)
 blah blah
float a;
float b;

$ awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' file
int matrix[20];

for (index = 0 ;index < 20; index++)
matrix[index] = index + 1;

for (index = 0; index < 5 ;index++)

float a;
float b;
for some reason this is not working on my machine:( `cat testint matrix[20];/* generate data */for (index = 0 ;index < 20; index++)matrix[index] = index + 1;/* print original data */` and the output is `awk -vRS='*/' '{ gsub(/\/\*.*/,"")}1' testint matrix[20];/ generate data/for (index = 0 ;index < 20; index++)matrix[index] = index + 1;/ print original data/`
i already indicated, using gawk. do you have gawk?
sorry, the comment is so messed up, i didn't notice you have output. Well, it worked for me. I see you still have /generate data/ and /print original data/. As you can see from my output, it works for me.
if you still can't get it to work, there's the perl solution below you can try
Try this on the command line (replacing 'file-names' with the list of file that need to be processed):

perl -i -wpe 'BEGIN{undef $/} s!/\*.*?\*/!!sg' file-names

This program changes the files in-place (overwriting the original file with the corrected output). If you just want the output without changing the original files, omit the '-i' switch.


perl -- call the perl interpreter
-i      switch to 'change-in-place' mode.
-w      print warnings to STDOUT (if there are any)
 p      read the files and print $_ for each record; like while(<>){ ...; print $_;}
 e      process the following argument as a program (once for each input record)

BEGIN{undef $/} --- process whole files instead of individual lines.
s!      search and replace ...
  /\*     the starting /* marker
  .*?     followed by any text (not gredy search)
  \*/     followed by the */ marker
!!      replace by the empty string (i.e. remove comments)  
  s     treat newline characters \n like normal characters (remove multi-line comments)
   g    repeat as necessary to process all comments.

file-names   list of files to be processed.
See the perlfaq to understand why this is so very wrong.
@brian Accepted: This is only an approximate solution.
Take a look at the strip_comments routine in Inline::Filters:

sub strip_comments {
    my ($txt, $opn, $cls, @quotes) = @_;
    my $i = -1;
    while (++$i < length $txt) {
    my $closer;
        if (grep {my $r=substr($txt,$i,length($_)) eq $_; $closer=$_ if $r; $r}
        @quotes) {
        $i = skip_quoted($txt, $i, $closer);
        if (substr($txt, $i, length($opn)) eq $opn) {
        my $e = index($txt, $cls, $i) + length($cls);
        substr($txt, $i, $e-$i) =~ s/[^\n]/ /g;
    return $txt;
printf("... /* ...");
int matrix[20];
printf("... */ ...");

In other words: I wouldn't use regex for this task, unless you're doing a replace-once and are positive that the above does not occur.

Why not just use the c preprocessor to do this? Why are you confining yourself to a home-grown regex?

[Edit] This approach also handles Barts printf(".../*...") scenario cleanly


[File: t.c]
/* This is a comment */
int main () {
     * This
     * is 
     * a
     * multiline
     * comment
    int f = 42;
     * More comments
    return 0;


$ cpp -P t.c
int main () {

    int f = 42;

    return 0;

Or you can remove the whitespace and condense everything

$ cpp -P t.c | egrep -v "^[ \t]*$"
int main () {
    int f = 42;
    return 0;

No use re-inventing the wheel, is there?

[Edit] If you want to not expand included files and macroa by this approach, cpp provides flags for this. Consider:

[File: t.c]

#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;


$ cpp -P -fpreprocessed t.c | grep -v "^[ \t]*$"
#include <stdio.h>
int main () {
    int f = 42;
    printf("   /*  ");
    printf("   */  ");
    return 0;

There is a slight caveat in that macro expansion can be avoided, but the original definition of the macro is stripped from the source.

Yes, this is what I´d use!
The preprocessor has a (potentially indesirable) "side-effect" : it also processes macros, includes included files, and so on...
You can get rid of macro expansion by `-fpreprocessed`. I'll update to mention this
-1 again. That is not a *slight* caveat if you expect the source code to compile after removing comments.
This caveat can be fixed: perl -wpe 's/^\s*#define/#include#define/' your-file.c |cpp -P - -fpreprocessed|perl -wpe 's/#include#define/#include/ ---- this turns any #defines into (somewhat invalid) #includes that pass through the preprocessor, to be converted back to correct #defines later. (If you agree, please add this to the answer itself).
Please do not use cpp for this unless you understand the ramifications:

$ cat t.c
#include <stdio.h>

#define MSG "Hello World"

int main(void) {
    /* ANNOY: print MSG using the puts function */
    return 0;

Now, let's run it through cpp:

$ cpp -P t.c -fpreprocessed

#include <stdio.h>

int main(void) {

    return 0;

Clearly, this file is no longer going to compile.

well, not after you add the `-fpreprocessed` flag, anyway
@Hasturkun and if you don't add -fpreprocessed, `#include <stdio.h>` will be expanded.
I tried this: perl -wpe 's/^\s*#define/#include#define/' your-file.c |cpp -P - -fpreprocessed|perl -wpe 's/#include#define/#include/ ---- this turns any #defines into (somewhat invalid) #includes that pass through the preprocessor, to be converted back to correct #defines later.
