Reading a CSV file in chunks

For a new import feature for my users, I'm using fgetcsv to read CSV files generated by social book cataloging sites like LibraryThing and Shelfari and then I run a series of Amazon queries to look up ISBNs from within the files. I want to be able to have users confirm the titles that match certain criteria, then add them to their local bookshelves.

Some of these files will have hundreds or thousands of records and I can't average more than 1 lookup per second with Amazon. I also want users to confirm we've matched their books correctly. I'm thinking that I should process the CSV file in chunks of 10 or 20 records, and display 'hits' for users to confirm. But I can't figure out how to do this effectively.

I can read the CSV file into an array, selecting only ISBNs, for example, and I know I can use a simple loop on the array to test 10 or 20 records against Amazon. But how do I allow the user to accept or reject the batch of records and then review 10 or 20 more without running fgetcsv again on the CSV file with a page reload?

Is there some simple way to allow an array to persist between page loads? Or perhaps I can pause to accept user input within the loop itself

Why not just do a 'delayed' import method? Allow the CSV import, process the data in a 'temporary storage' database, and on the backend you lookup ISBN's via the Amazon process.

The user will be prompted with a "your records are being processed, you will be asked to review them once we have finished validating", etc;

At that point they can go through your listing locally and not be limited by the 1/second ISBN lookup of Amazon. I doubt a user would want to sit there while 10/20 are processed then wait more for the next page, and the next.

So the process should fold out like so:

User imports data (while on the backend a cronjob/process goes through the records one by one without making the user wait).
User prompted to come back to verify data / User comes back after period of time (is notified etc)
User goes through the list of data and validates it (paginated), upon acceptance you move the accepted/valid entries into your final database (live valid data).
If the user wants to stop @ record 100 of 100,000 you give them that option, and they have this sort of 'queue' of validation.

Hows that sound? A bit more work but seems the best approach for handling large data entries like this.

ansaurus

tags:

views:

answers:

Reading a CSV file in chunks

related questions