I'm reading and processing a stream of input from the ARGV filehandle in Perl (i.e. the a regular filehandle, which may be STDIN. However, I need to analyze a significant portion of the input in order to detect which of four different but extremely similar formats it is encoded in (different ASCII encodings of FASTQ quality scores; see here). Once I've decided which format the data is in, I need to go back and parse those lines a second time to actually read the data.while(<>)
construct)
So I need to read the first 500 lines or so of the stream twice. Or, to look at it another way, I need to read the first 500 lines, and then "put them back" so I can read them again. Since I may be reading from STDIN, I can't just seek back to the beginning. And the files are huge, so I can't just read everything into memory (although reading those first 500 lines into memory is ok). What's the best way to do this?
Alternatively, can I duplicate the input stream somehow?
Edit: Wait a minute. I just realized that I can't process the input as one big stream anymore, because I have to detect each file's format independently. So I can't use ARGV. The rest of the question still stands, though.