tags:

views:

958

answers:

5

I am already using this example of how to read large data files in PHP line by line

Now, what it'd like to do, is obtain the total number of rows in the file so that I may display a percentage complete or at least what the total number of rows are so I can provide some idea of how much processing is left to be done.

Is there a way to get the total number of rows without reading in the entire file twice? (once to count the rows and once to do the processing)

+11  A: 

Poor mans answer:

No, but you can estimate. Calc a simple average reading (use the first 250 lines) and go with that.

estNumOfLines = sizeOfFile / avgLineSize

You could store off the number of lines in the file when you are creating the file...

Alternatively, you could display the number of KB processed, and that would be perfectly accurate.

altCognito
+1 :-D nice answer, hehe (see mine below)
Itay Moav
+1 for your bolded suggestion (the only reasonable approach, IMO)
rmeador
I vote for just displaying the amount of data processed. Would probably be the best solution.
Kibbee
The entire process is based on the number of products processed, and I agree that usually the amount of actual data processed is preferable, but in this case the number of products processed makes more sense.
Failpunk
+1  A: 

How would you know the number of pages in a book, without counting them?
You would measure the width of a page and the width of the book and divide one by the other.

Same here, calculate the average line length from the first few lines, then do the same math with the file size...

Itay Moav
page size is constant. line width and bytes with utf-8 (or similar) special chars is not.
OIS
Average....It is all averages when you do such calculations (available bandwidth, available CPU resources etc), When was the last time you saw ANY progress bar who got the timing right. Besides, the solution with calculating the bytes themselves is better, I wrote this as he asked about lines.
Itay Moav
+4  A: 

You can determine the size of the file, then guage your progress through it by adding up the size of your reads:

$fname = 'foofile.txt';
$fsize = filesize($fname);
$count = 0;
$handle = fopen($fname, "r") or die("Couldn't get handle");
if ($handle) {
  while (!feof($handle)) {
    $buffer = fgets($handle, 4096);
    // Process buffer here..
    $count++;
    echo ($count * 4096)/$fsize . " percent read.";
  }
  fclose($handle);
}

Note: code adapted from referenced answer

vezult
+1 for the code :)
altCognito
Thanks for posting the code, even though I didn't pick your solution, this will be handy for future use!
Failpunk
+3  A: 

Is there any reason you need to count rows and not bytes? If all you want to know is "percent done", just track it the by number bytes read/total bytes.

KenE
The processing is going to be done on a very large third party product file on a line by line(product by product) basis.
Failpunk
use the byte by byte suggested by vezult
OIS
A: 

use the linux command "wc -l filename.txt" This will output the number of lines in a file.

majestiq