ansaurus

Question

Renaming contents of text file using Regular Expressions

Answer 1

+1 A:

Personally, if it is this structured of a document, don't bother with a regex.

Just loop through the file, do a split on the " " character, then simply omit the second entry.

Mitchel Sellers 2009-09-21 22:01:34

Answer 2

+2 A:

Something like...:

for theline in fileinput.input(inplace=1):
  print re.sub(r'(\w+\s*+)\d+\s+(.*)', r'\1\2', theline),

...should meet your needs.

Alex Martelli 2009-09-21 22:04:05

Answer 3

A:

I don't know what platform you're using Eclipse on, but if it's linux or you have cygwin, cut is very fast!

cut -d" " --complement -f2 $FILE

This will use space as the delimiter, and select the complement of the second field.

If you really want to use a regular expression, you can do something like this:

sed -r 's/^ *([^ ]+) +[^ ]+ +(.+)/\1 \2/' $FILE

You could easily use the same expression in python or perl, of course, but Mitchel's right - splitting is easy. (Unless the text is extremely long, and it'll waste time unnecessarily splitting other fields).

Jefromi 2009-09-21 22:04:46

I'd suggest changing all the asterisks in that regex to plus signs — you need at least one space to delimit columns and at least one non-space to *be* a column. It'll make things considerably faster if it encounters lines which don't match.

Ben Blank 2009-09-21 22:51:51

Careless mistake on my part, thanks.

Jefromi 2009-09-22 00:41:07

Answer 4

+4 A:

This is basically what the cut utility is for:

cut -d " " -f 1,3-

(update: I forgot the -f option, sorry.)

This takes a file, considers fields delimited by spaces, and outputs the first, third and following fields.

(If you're on Windows, you should have these unix-style utilities anyway, they can be incredibly useful.)

Using a regex, you could replace (\w+) \d+ (.*) with $1 $2. Something like:

sed -r -e "s/([^ ]+) [0-9]+ (.*)/\1 \2/" file

or

perl -p -e "s/(\w+) \d+ (.*)/\1 $2/" file

Tim Sylvester 2009-09-21 22:07:09

Answer 5

+1 A:

You can indeed use Eclipse's find and replace feature, using the following:

Find: ^([a-z]+) \d
Replace with: \1

This is essentially matching the gatename at the beginning of each line (^([a-z]+)) followed by the output ( \d), and replacing it with just the matched gatename (\1).

JG 2009-09-21 22:14:39

ansaurus

tags:

views:

answers:

Renaming contents of text file using Regular Expressions

related questions