views:

164

answers:

5

I've a text file with 2 million lines. Each line has some transaction information.

e.g.

23848923748, sample text, feild2 , 12/12/2008

etc

What I want to do is create a new file from a certain unique transaction number onwards. So I want to split the file at the line where this number exists.

How can I do this form the command line?

I can find the line by doing this:

cat myfile.txt | grep 23423423423
A: 

It's not a pretty solution, but how about using -A parameter of grep?

Like this:

mc@zolty:/tmp$ cat a
1
2
3
4
5
6
7
mc@zolty:/tmp$ cat a | grep 3 -A1000000
3
4
5
6
7

The only problem I see in this solution is the 1000000 magic number. Probably someone will know the answer without using such a trick.

Grzegorz Oledzki
A: 

You can probably get the line number using Grep and then use Tail to print the file from that point into your output file.

Sorry I don't have actual code to show, but hopefully the idea is clear.

Assaf Lavie
+3  A: 

use sed like this

sed '/23423423423/,$!d' myfile.txt

Just confirm that the unique transaction number cannot appear as a pattern in some other part of the line (especially, before the correctly matching line) in your file.


There is already a 'perl' answer here, so, i'll give one more AWK way :-)

awk '{BEGIN{skip=1} /number/ {skip=0} // {if (skip!=1) print $0}' myfile.txt
nik
it shouldn't appear twice but just in case it did, how could I amend it so it works from the first occurrence to the end of the file.
Derek Organ
Get some constant pattern that will qualify the match to be occurring only with the transaction number. Like is the number first thing on the line? (then, match "^number"), Is it prefixed or suffixed with a whitespace or say the ':' character? (try "number:", etc).
nik
`awk '/23423423423/,0{print}'` is shorter -- in fact, you can even throw out `{print}`, as that's the default action.
ephemient
A: 

I would write a quick Perl script, frankly. It's invaluable for anything like this (relatively simple issues) and as soon as something more complex rears its head (as it will do!) then you'll need the extra power.

Something like:

#!/bin/perl

my $out = 0;
while (<STDIN>) {
   if /23423423423/ then $out = 1;
   print $_ if $out;
}

and run it using:

$ perl mysplit.pl < input > output

Not tested, I'm afraid.

Brian Agnew
Shorter: perl -ne 'print if /23423423423/ .. eof()'
ephemient
That's better. I was aware you could do that but had forgotten the details etc.
Brian Agnew
I modified this slightly to get it to work (and also to ignore case if searching for a text string). I changed the if statement to: if ($_ =~/stevens/i) { $out = 1;}Hope that's of interest to someone..
DBMarcos99
+1  A: 

On a random file in my tmp directory, this is how I output everything from the line matching popd onwards in a file named tmp.sh:

tail -n+`grep -n popd tmp.sh | cut -f 1 -d:` tmp.sh

tail -n+X matches from that line number onwards; grep -n outputs lineno:filename, and cut extracts just lineno from grep.

So for your case it would be:

 tail -n+`grep -n 23423423423 myfile.txt | cut -f 1 -d:` myfile.txt

And it should indeed match from the first occurrence onwards.

Mark Rushakoff
cheers, that worked a charm.
Derek Organ
well to be more specific this worked tail -n+`grep -n 23423423423 myfile.txt | cut -f 1 -d:` myfile.txt > newfile.txt
Derek Organ
@Derek, I was surprised to see you preferred a tail+grep+cut over a simple stream edit...
nik
@nik i''m sure yours works as well but i actually understood this one. so its my lack of knowledge that made me choose this answer. In saying that this worked as expected and quickly.
Derek Organ