tags:

views:

622

answers:

6

Hi there,

I am getting a stream of data (text format) from an external server and like to pass it on to a script line-by-line. The file is getting appended in a continuous manner. Which is the ideal method to perform this operation. Is IO:Socket method using Perl will do? Eventually this data has to pass through a PHP program (reusable) and eventually land onto a MySQL database.

The question is how to open the file, which is continuously getting updated?

TIA jk

+3  A: 

Perhaps a named pipe would help you?

Alex Reynolds
+7  A: 

In Perl, you can make use of seek and tell to read from a continuously growing file. It might look something like this (borrowed liberally from perldoc -f seek)

open(FH,'<',$the_file) || handle_error();  # typical open call
for (;;) {
    while (<FH>) {
        # ... process $_ and do something with it ...
    }
    # eof reached on FH, but wait a second and maybe there will be more output
    sleep 1;
    seek FH, 0, 1;      # this clears the eof flag on FH
}
mobrule
Yeah, works the same in Python (or C, for that matter;-).
Alex Martelli
Right. seek and tell are just file system library functions. They'll do the same thing in any language that has them.
mobrule
A: 

In python it is pretty straight-forward:

f = open('teste.txt', 'r')
for line in f: # read all lines already in the file
    print line.strip()

# keep waiting forever for more lines.
while True:
    line = f.readline() # just read more
    if line: # if you got something...
        print 'got data:', line.strip()
    time.sleep(1) # wait a second to not fry the CPU needlessy
nosklo
Not sure why I got negative votes.
nosklo
A: 

The solutions to read the whole fine to seek to the end are perfomance-unwise. If that happens under Linux, I would suggest just to rename the log file. Then, you can scan all the entites in the renamed file, while those in original file will be filled again. After scanning all the renamed file - delete it. Or move whereever you like. This way you get something like logrotate but for scanning newly arriving data.

FractalizeR
+2  A: 

In perl there are a couple of modules that make tailing a file easier. IO::Tail and File::Tail one uses a callback the other uses a blocking read so it just depends on which suits your needs better. There are likely other tailing modules as well but these are the two that came to mind.

IO::Tail - follow the tail of files/stream

 use IO::Tail;
 my $tail = IO::Tail->new();
 $tail->add('test.log', \&callback);
 $tail->check();
 $tail->loop();

File::Tail - Perl extension for reading from continously updated files

use File::Tail;
my $file = File::Tail->new("/some/log/file");
while (defined(my $line= $file->read)) {
    print $line;
}
mikegrb
What is there in callback sub for IO::Tail?
Space
A: 

You talk about opening a file, and ask about IO::Socket. These aren't quite the same things, even if deep down you're going to be reading data of a file descriptor.

If you can access the remote stream from a named pipe or FIFO, then you can just open it as an ordinary file. It will block when nothing is available, and return whenever there is data that needs to be drained. You may, or may not, need to bring File::Tail to bear on the problem of not losing data if the sender runs too far ahead of you.

On the other hand, if you're opening a socket directly to the other server (which seems more likely), IO::Socket is not going to work out of the box as there is no getline method available. You would have to read and buffer block-by-block and then dole it out line by line through an intermediate holding pen.

You could pull out the socket descriptor into an IO::Handle, and use getline() on that. Something like:

my $sock = IO::Socket::INET->new(
    PeerAddr => '172.0.0.1',
    PeerPort => 1337,
    Proto    => 'tcp'
) or die $!;

my $io = new IO::Handle;
$io->fdopen(fileno($sock),"r") or die $!;

while (defined( my $data = $io->getline() )) {
    chomp $data;
    # do something
}

You may have to perform a handshake in order to start receiving packets, but that's another matter.

dland