I have a file with n lines. (n above 100 millions)
I want to output a file with only 1 of 10 lines, I can't split the file in ten part and keep only one part, as it must be a little more random. later I have to do a statistical analysis I can't afford to create a strong bias in the data).
I was thinking of reading the file and for each record if the record number mod 10 then output it.
The constraints are:
it's a windows (likely hardened) computer possibly XP Vista or Windows server 2003.
no development tools available
no network,usb,cd-rom. read no external communication.
Therefore I was thinking of windows batch file (I can't assume powershell, and vbscript is likely to have been removed). And at the moment looking at the FOR /F command. Still I am not an expert and I don't know how to achieve this.
Thank you Paul for your answer. I reformat (with Hosam help) the answer to put it in a batch file:
@echo off
setlocal
findstr/N . inputFile| findstr ^[0-9]*0: >temporaryFile
FOR /F "tokens=1,* delims=: " %%i in (temporaryfile) do echo %%j > outputFile
Thanks quux and Pax for the similar alternative solution. However after a quick test on a larger file Paul's answer is roughly 8 times faster. I guess the evaluation (in SET) is kind of slow, even if the logic seems brilliant.