views:

96

answers:

5

I am importing a csv file with more then 5,000 records in it. What i am currently doing is, getting all file content as an array and saving them to the database one by one. But in case of script failure, the whole process will run again and if i start checking the them again one by one form database it will use lots of queries, so i thought to keep the imported values in session temporarily.

Is it good practice to keep that much of records in the session. Or is there any other way to do this ?

Thank you.

A: 

If you are using Postgresql, you can use a single query to insert them all using pg_copy_from., or you can use pg_put_line like it is shown in the example (copy from stdin), which I found very useful when importing tons of data.

If you use MySql, you'll have to do multiple inserts. Remember to use transactions, so that if you use transactions, if your query fails it will be canceled and you can start over. Note that 5000 rows is not that large! You snould however be aware of the max_execution_time constraint which will kill your script after a number of seconds.

For what the SESSION is concerned, I believe that you are limited by the maximum amount of memory a script can use (memory_limit in php.ini). Session data are saved in files, so you should consider also the disk space usage if many clients are connected.

Palantir
+1  A: 

It's not a good idea imho since session data will be serialized/unserialized for every page request, even if they are unrelated to the action you are performing.

I suggest using the following solution:

  1. Keep the CSV file lying around somewhere
  2. begin a transaction
  3. run the inserts
  4. commit after all inserts are done
  5. end of transaction

Link: MySQL Transaction Syntax

If something fails the inserts will be rolled back so you know you can safely redo the inserts without having to worry about duplicate data.

dbemerlin
+1 For trying to solve the actual problem.
Gumbo
I don't have controls over the queries, as i'm using Elgg Framework. But this is a real good way to solve this kind of problems. Thanks
Chetan sharma
+2  A: 

If you have to do this task in stages (and there's a couple of suggestions here to improve the way you do things in a single pass), don't hold the csv file in $_SESSION... that's pointless overhead, because you already have the csv file on disk anyway, and it's just adding a lot of serialization/unserialization overhead to the process as the session data is written.

You're processing the CSV records one at a time, so keep a count of how many you've successfully processed in $_SESSION. If the script times out or barfs, then restart and read how many you've already processed so you know where in the file to restart.

Mark Baker
Its a good idea, working on this. +1
Chetan sharma
+1  A: 

What can be the maximum size for the $_SESSION ?

The session is loaded into memory at run time - so it's limited by the memory_limit in php.ini

Is it good practice to keep that much of records in the session

No - for the reasons you describe - it will also have a big impact on performance.

Or is there any other way to do this ?

It depends what you are trying to achieve. Most databases can import CSV files directly or come with tools which will do it faster and more efficently than PHP code.

C.

symcbean
A: 

To answer the actual question (Somebody just asked a duplicate, but deleted it in favour of this question)

The default session data handler stores its data in temporary files. In theory, those files can be as large as the file system allows.

However, as @symcbean points out, session data is auto-loaded into the script's memory when the session is initialized. This limits the maximum size you should store in session data severely. Also, loading lots of data has a massive impact on performance.

If you have huge amounts of data you need to store connected to a session, I would recommend using temporary files that you name by the current session ID. You can then deal with those files as needed, and as possible within the limits of the script's memory_limit.

Pekka