tags:

views:

98

answers:

4

I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?

Also, the lengths of the lines stored are variable.

@gordon : Here's the code. Also the file is 700kB not mB.

  private static function readScoreFile($scoreFile)
{
    $file = file($scoreFile);
    $relations = array();

    for($i = 1; $i < count($file); $i++)
    {
        $relation = explode("\t",trim($file[$i]));
        $relation = array(
                        'pwId_1' => $relation[0],
                        'pwId_2' => $relation[1],
                        'score' => $relation[2],
                        );
        if($relation['score'] > 0)
        {
            $relations[] = $relation;
        }
    }

    unset($file);
    return $relations;
}
+6  A: 

Use fopen, fread and fclose to read a file sequentially:

$handle = fopen($filename, 'r');
if ($handle) {
    while (!feof($handle)) {
        echo fread($handle, 8192);
    }
    fclose($handle);
}
Gumbo
this doesnt works, i want to read line by line. Its returning mutliple lines on each fread (i guess 8192 bytes)
Chetan
replace fread with "fgets": fgets — Gets line from file pointer
Killer_X
You can use an intermediate variable $line to store the bytes of each line, and then echo $line. fread is probably one of the most efficient way to stream the file, so read the results of fread(and append to $line) until you find a line break. Then do whatever you want with that line, then set $line="", and resume appending the results of fread to $line.
luiscubal
The issue is, the length of the lines are variable. So, at some places I get like half the lines
Chetan
+4  A: 

EDIT after update of question and comments to answer of fabjoa:

There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)

SplFileObject::setCsvControl example

$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
    list ($fruit, $quantity) = $row;
    // Do something with values
}

For an OOP approach to iterate over the file, try SplFileObject:

SplFileObject::fgets example

$file = new SplFileObject("file.txt");
while (!$file->eof()) {
    echo $file->fgets();
}

SplFileObject::next example

// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
    echo $file->current();
    $file->next();
}

or even

foreach(new SplFileObject("misc.txt") as $line) {
    echo $line;
}

Pretty much related (if not duplicate):

Gordon
I think this still can potentially use a big chunk of memory, as I think it continues to read until it finds an end-of-line.
Artefacto
same as above, i want to read line by line(terminated by \n)
Chetan
@Artefacto well, you can still use `SplFileObject::setMaxLineLen` if that is an issue.
Gordon
@Gordon Right. I see my familiarity with SplFileObject could be improved :p
Artefacto
@Gordon Why not foreach then? `foreach (new SplFileObject("misc.txt") as $line) { ... }`
Artefacto
@Artefacto because I am lazily copying the examples from the PHP Manual ;)
Gordon
@Gordon, don't be lazy.
salathe
@salathe It's too hot not to be. Add better examples to the docs ;) (j/k)
Gordon
@Gordon, I agree with the too hot (as good an excuse as any)!! :-)
salathe
@Gordon (and @Chetan), since the file contains TSV then the "CSV" reading capabilities of `SplFileObject` might be of some practical use. :)
salathe
@salathe already added ;)
Gordon
A: 

allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done

fabjoa
I'm pretty sure that you don't have to reset the memory limit after the operation, it only applies to the currently running script.
George Marian
I am already using 140MB of memory (there is a lot of stuff going on part from reading the file)
Chetan
@Chetan this sounds fishy to me. 50k lines aint that much. The [King James Bible](http://www.gutenberg.org/etext/26361) has around 20k lines, is 1MB in plain text and only takes up about ~3MB when read in with file(). What is the total size in Bytes of your file?
Gordon
@Gordon The file is like 700 MB, however its a TSV file, after reading the file, I am splitting each line and storing it into an array.So thats like an array of 30k X 5, which is why its taking so much memory, I guess
Chetan
@Chetan are you sure you are not leaking any memory somewhere? Try unsetting unused variables, especially while looping. Maybe you can post some of your code for us to see.
Gordon
@Gordon - Have added the code and some more details.
Chetan
@Chetan thanks. There is definitely something wrong if the file is just 700kb though.
Gordon
A: 

If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.

Other than that the following code should help you out:

    // length is a large number or calculated from an initial file scan
    while (!feof($handle)) {
        $buffer = fgets($handle, $length);
        echo $buffer;
    }
zaf