ansaurus

Question

Match everything that isn't a number followed by a letter

Answer 1

+5 A:

You could match using /(\d+[A-Z])/

gnarf 2010-07-27 21:56:20

Indeed. Rather than removing everything else, in this case it's simpler just to match what you want and spit that out.

Kevin Ballard 2010-07-27 22:23:37

Answer 2

A:

Writing a script that scans through line by line or word by word depending on the how the occ codes appear in the file and checking for matches possibly using a REGEX then wrting them to another file is a simple solution.

You COULD use a single regex match on the entire document and iterate over the results but that could pose problems depending on the size of the file.

Derek Litz 2010-07-27 22:01:37

Answer 3

A:

Here's a crude attempt to remove everything except the desired codes using sed. (Note that I interpret "number" to mean a string of one or more digits, no decimal point or leading minus sign.)

sed -e 's/\([A-Z]\)[0-9]*/\1/g' -e 's/[0-9]*[^0-9A-Z]*//g' -e 's/[0-9]*$//' -e '/^$/d' < filename

The first command removes anything after a capital letter that isn't a number (and therefore perhaps the beginning of another code), the second removes any number followed by something other than a capital letter, the third removes trailing numbers and the fourth removes blank lines.

I've run some tests and this seems to work pretty well. I'll happily amend it if anyone can find a case where it fails.

Beta 2010-07-27 23:36:34

ansaurus

tags:

views:

answers:

Match everything that isn't a number followed by a letter

related questions