views:

721

answers:

7

I need to find a faster way to number lines in a file in a specific way using tools like awk and sed. I need the first character on each line to be numbered in this fashion: 1,2,3,1,2,3,1,2,3 etc.

For example, if the input was this:

line 1
line 2
line 3
line 4
line 5
line 6
line 7

The output needs to look like this:

1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

Here is a chunk of what I have. $lines is the number of lines in the data file divided by 3. So for a file of 21000 lines I process this loop 7000 times.

export i=0
while [ $i -le $lines ]
do
    export start=`expr $i \* 3 + 1`
    export end=`expr $start + 2`
    awk NR==$start,NR==$end $1 | awk '{printf("%d%s\n", NR,$0)}' >> data.out
    export i=`expr $i + 1`
done

Basically this grabs 3 lines at a time, numbers them, and adds to an output file. It's slow...and then some! I don't know of another, faster, way to do this...any thoughts?

+2  A: 

Perl comes to mind:

perl -pe '$_ = (($.-1)%3)+1 . $_'

should work. No doubt there is an awk equivalent. Basically, ((line# - 1) MOD 3) + 1.

derobert
perl -e 'printf "%d%s", (($.-1)%3)+1, $_' :-D
Jonathan Leffler
Jonathan: Why use printf? Seems like derobert's answer is more straightforward.
Jon Ericson
Because using printf() doesn't modify $_; there might even be a time saving, though it is unlikely to be sufficient to worry about.
Jonathan Leffler
If you want optimization, print will probably be faster than printf.
derobert
+3  A: 

Try the nl command.

See http://www.rt.com/man/nl.1.html

The nl utility reads lines from the named file or the standard input if the file argument is ommitted, applies a configurable line numbering filter operation and writes the result to the standard output.

edit: No, that's wrong, my apologies. The nl command doesn't have an option for restarting the numbering every n lines, it only has an option for restarting the numbering after it finds a pattern. I'll make this answer a community wiki answer because it might help someone to know about nl.

Bill Karwin
Love the Unix tools attempt at an answer in a scripting question. There is also "cat -n" as a less polished nl. And for the reflective student of sed, the following can be modified to get the exact answer desired: http://www.gnu.org/software/sed/manual/sed.html#cat-_002dn
jaredor
A: 
awk '{printf "%d%s\n", ((NR-1) % 3) + 1, $0;}' "$@"
Jonathan Leffler
+4  A: 

It's slow because you are reading the same lines over and over. Also, you are starting up an awk process only to shut it down and start another one. Better to do the whole thing in one shot:

awk '{print ((NR-1)%3)+1 $0}' $1 > data.out

If you prefer to have a space after the number:

awk '{print ((NR-1)%3)+1, $0}' $1 > data.out
Jon Ericson
+1  A: 

Python

import sys
for count, line in enumerate(sys.stdin):
    stdout.write( "%d%s" % ( 1+(count % 3), line )
S.Lott
A: 

This should solve the problem. $_ will print the whole line.

awk '{print ((NR-1)%3+1) $_}' < input
1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

# cat input 
  line 1
  line 2
  line 3
  line 4
  line 5
  line 6
  line 7
Ganesh M
A: 

You don't need to leave bash for this:

i=0; while read; do echo "$((i++ % 3 + 1)) $REPLY"; done < input
PEZ