views:

273

answers:

4

Good day members,

I have an input file which has rows of numerical digits ( close to 2000 rows ) I want to extract out " the second to the eight digit from the right" of every row of the numeric string into a separate file, with the result separated by a comma as shown.

Example: input.txt

00000000000001303275310752

00000000000001827380519015

00000000000000800081610361

00000000000000449481894004

00000000000000449481894004

00000000000001812612607514

Expected result: newfile.txt

7531075,

8051901,

8161036,

8189400,

8189400,

1260751,

I'm guessing something like 'sed' can be used to solve my problem, but i'm not quite sure how to go about achieving this. I'm connected to a machine running on Solaris 5.10 Appreciate if someone can guide me with a brief explanation.

regards,

novice.

+2  A: 

For fixed width input, try:

cut -c19-26 input.txt | sed 's/$/,/'

which is to say, extract the 19th to 26th character of input txt and then replace the end of line with a comma.

If you have variable length lines, you will need something a little different.

Emil
Thanks... Works like a charm!
novice
+1  A: 

You can truncate the leading zeros with:

sed 's/^0*//g'

Thus something like:

sed 's/^0*//g' input.txt | sed 's/$/,/'

should work.

dcruz
While a useful trick, it isn't quite what the poster is asking
Adam Batkin
Thanks dcruz. Noteworthy utility too. i'm sure i'll be looking for something like this, in the future, trying to manipulate data :)
novice
+1  A: 

Try:

perl -pe 's/^.*(\d{7})\d$/$1,/' < input.txt

Or if you don't like regular expressions:

perl -pe '$_ = substr($_,-9,-2) . ",\n"' < input.txt

This will work for any fixed or variable length line.

Adam Batkin
Thanks adam, for the perl implementation.. gives me the result too, albeit without the 'comma' :D
novice
Heh, missed that, but that is a quick fix (done now)
Adam Batkin
A: 

Here is a solution in python, it should be intuitive:

$ cat data2
00000000000001303275310752
00000000000001827380519015
00000000000000800081610361
00000000000000449481894004
00000000000000449481894004
00000000000001812612607514

$ cat digits.py
import sys
for line in sys.stdin:
    print '%s,' % (line[-9:-2])

$ python digits.py < data2
7531075,
8051901,
8161036,
8189400,
8189400,
1260751,
Hai Vu