tags:

views:

31

answers:

3

I have a ftp repository that is currently at 2761 files (PDF files). I have a MySQL table (a list of those files) that's actually at 29k+ files (it hasn't been parsed since a recent upgrade). I'd like to provide admins with a one-click script that will do the following:

  • 1) Compare the "existing" filenames with the rows in the database table
  • 2) Delete any rows that are not in the existing filesystem
  • 3) Add a new row for a file that doesn't appear in the database table

This usually is handled by an AppleScript/FolderAction/Perl script method, but it's not perfect (it chokes sometimes when large numbers of files are added at a time - like on heavy news nights).

It takes about 10-20 seconds to build the file list from the FTP repository (using $file_list = ftp_nlist($conn_id,$target_dir) ), and I'm not sure how to best go about comparing with the DB table (I'm positive that an "WHERE NOT IN (big_fat_list)" would be a nightmare query to run).

Any suggestions?

+1  A: 

Load the list of filenames into another table, then perform a couple of queries that fulfill your requirements.

Ignacio Vazquez-Abrams
Yeah, I got that answer elsewhere, I think that might have to be the route.I just added an index on the "filename" column in order to speed things up.
Chad Edge
A: 

Yup that is the solution. I propose you to use pdo prepared insert statement to reduce time. or do what mysqldump does, generate insert into table(column1,column2, ... ) values(), (), (), ... ; insert into ...

you would have to check the maximun values list in mysql site.

A: 

I usually dump the recursive directory list with dates and file sizes to a temporary table. Then I remove items not found:

delete
from A
where not exists (
    select null as nothing
    from   temp b
    where  a.key = b.key )

Then I update items already there (for file sizes, CRCs):

update a set nonkeyfield1 = b.nonkeyfield1, nonkeyfield2 = b.nonkeyfield2
from   a join temp b on a.key = b.key

Then I add items found:

insert into A ( field, list)
select field, list
from   temp b
where  not exists (
    select null as nothing
    from   A
    where  b.key = a.key )

This is from memory, so test it first before you fly. The select null as nothing keeps you from wasting RAM while you check things.

Dr. Zim