I don't know if you're familiar with sed
, the UNIX-based (but Windows-available) text parsing program, but I've found a sed script here which will remove C/C++ comments from a file. It's very smart; for example, it will ignore '//' and '/*' if found in a string declaration, etc. From within Python, it can be used using the following code:
import subprocess
from cStringIO import StringIO
input = StringIO(source_code) # source_code is a string with the source code.
output = StringIO()
process = subprocess.Popen(['sed', '/path/to/remccoms3.sed'],
input=input, output=output)
return_code = process.wait()
stripped_code = output.getvalue()
In this program, source_code
is the variable holding the C/C++ source code, and eventually stripped_code
will hold C/C++ code with the comments removed. Of course, if you have the file on disk, you could have the input
and output
variables be file handles pointing to those files (input
in read-mode, output
in write-mode). remccoms3.sed
is the file from the above link, and it should be saved in a readable location on disk. sed
is also available on Windows, and comes installed by default on most GNU/Linux distros and Mac OS X.
This will probably be better than a pure Python solution; no need to reinvent the wheel.