I can guarantee you that bash
alone won't be any faster than sed
for this task. Starting up external processes in bash
is a generally bad idea but only if you do it a lot.
So, if you're starting a sed
process for every line in your input, I'd be concerned. But you're not. You only need to start one sed
which will do all the work for you.
You may however find that the following sed
will be a bit faster than your version:
(whatever) | sed 's/...$//'
All this does is remove the last three characters on each line, rather than substituting the whole line with a shorter version of itself. Now maybe more modern RE engines can optimise your command but why take the risk.
To be honest, about the only way I can think of that would be faster would be to hand-craft your own C-based filter program. And the only reason that may be faster than sed
is because you can take advantage of the extra knowledge you have on your processing needs (sed
has to allow for generalised procession so may be slower because of that).
Don't forget the optimisation mantra: "Measure, don't guess!"
If you really want to do this one line at a time in bash
(and I still maintain that it's a bad idea), you can use:
pax> line=123456789abc
pax> line2=${line%%???}
pax> echo ${line2}
123456789
pax> _
You may also want to investigate whether you actually need a speed improvement. If you process the lines as one big chunk, you'll see that sed
is plenty fast. Type in the following:
#!/usr/bin/bash
echo This is a pretty chunky line with three bad characters at the end.XXX >qq1
for i in 4 16 64 256 1024 4096 16384 65536 ; do
cat qq1 qq1 >qq2
cat qq2 qq2 >qq1
done
head -20000l qq1 >qq2
wc -l qq2
date
time sed 's/...$//' qq2 >qq1
date
head -3l qq1
and run it. Here's the output on my (not very fast at all) R40 laptop:
pax> ./chk.sh
20000 qq2
Sat Jul 24 13:09:15 WAST 2010
real 0m0.851s
user 0m0.781s
sys 0m0.050s
Sat Jul 24 13:09:16 WAST 2010
This is a pretty chunky line with three bad characters at the end.
This is a pretty chunky line with three bad characters at the end.
This is a pretty chunky line with three bad characters at the end.
That's 20,000 lines in under a second, pretty good for something that's only done every hour.