views:

187

answers:

5

I have a text file with several lines in the following format:

gatename #outputs #inputs list_of_inputs_separated_by_spaces * gate_id

example: 
nand 3 2 10 11 * G0 (The two inputs to the nand gate are 10 and 11)
or 2 1 10 * G1 (The only input to the or gate is gate 10)

What I need to do is rename the contents such that I eliminate the #outputs column so that the end result is:

gatename #outputs list_of_inputs_separated_by_spaces * gate_id
nand 2 10 11 * G0
or 1 10 * G1

I tried using the find and replace function of Eclipse (the find parameter was a regex statement that didn't work), but it ended up messing up the gatename. I am considering using a Python script and iterating over each line of the text file. what I need help with is determining what the appropriate regex statement is.

+1  A: 

Personally, if it is this structured of a document, don't bother with a regex.

Just loop through the file, do a split on the " " character, then simply omit the second entry.

Mitchel Sellers
+2  A: 

Something like...:

for theline in fileinput.input(inplace=1):
  print re.sub(r'(\w+\s*+)\d+\s+(.*)', r'\1\2', theline),

...should meet your needs.

Alex Martelli
A: 

I don't know what platform you're using Eclipse on, but if it's linux or you have cygwin, cut is very fast!

cut -d" " --complement -f2 $FILE

This will use space as the delimiter, and select the complement of the second field.

If you really want to use a regular expression, you can do something like this:

sed -r 's/^ *([^ ]+) +[^ ]+ +(.+)/\1 \2/' $FILE

You could easily use the same expression in python or perl, of course, but Mitchel's right - splitting is easy. (Unless the text is extremely long, and it'll waste time unnecessarily splitting other fields).

Jefromi
I'd suggest changing all the asterisks in that regex to plus signs — you need at least one space to delimit columns and at least one non-space to *be* a column. It'll make things considerably faster if it encounters lines which don't match.
Ben Blank
Careless mistake on my part, thanks.
Jefromi
+4  A: 

This is basically what the cut utility is for:

cut -d " " -f 1,3-

(update: I forgot the -f option, sorry.)

This takes a file, considers fields delimited by spaces, and outputs the first, third and following fields.

(If you're on Windows, you should have these unix-style utilities anyway, they can be incredibly useful.)

Using a regex, you could replace (\w+) \d+ (.*) with $1 $2. Something like:

sed -r -e "s/([^ ]+) [0-9]+ (.*)/\1 \2/" file

or

perl -p -e "s/(\w+) \d+ (.*)/\1 $2/" file
Tim Sylvester
+1  A: 

You can indeed use Eclipse's find and replace feature, using the following:

Find: ^([a-z]+) \d
Replace with: \1

This is essentially matching the gatename at the beginning of each line (^([a-z]+)) followed by the output ( \d), and replacing it with just the matched gatename (\1).

JG