views:

103

answers:

2

I have read through the solutions to similar problems, but they all seem to involve scripts and extra tools. I'm hoping my problem simple enough to avoid that.

So the user uploads a csv of next week's data. It gets inserted into the DB, no problem.

BUT

an hour later he gets feedback from everyone, and must make updates accordingly. He updates the csv and goes to upload it to the DB.

Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.

Obviously, it is a lot easier to just wipe it clean and reenter the data, but not the best method, especially if there are lots of changes or tons of data. But I have to know WHAT changes have been made to send out alerts. But I don't want a transaction log, as the alerts only need to be sent out the one time and after that, the old data is useless.

So!

Is there a smart way to compare the new data to the already existing data, get only the rows that are changed/deleted/added, and make those changes? Right now it seems like I could do an update, but then I won't get any response on what has changed...

Thanks!

Quick Edit:

No foreign keys are currently in use. This will soon change, but it shouldn't make a difference, because the foreign keys will only point to who the data effects and thus won't need to be changed. As far as primary keys go, that does present a bit of a dilemma:

The data in question is everyone's work schedule. So it would be nice (for specific applications of this schedule beyond simple output) for each shift to have a key. But the problem is, let's say that user1 was late on Monday. The tardiness is recorded in a separate table and is tied to the shift using the shift key. But if on Tuesday there is some need to make some changes to the week already in progress, my fear is that it will become too difficult to insure that all entries in the DB that have already happened (and thus may have associations that shouldn't be broken) will get re-keyed in the process. Unfortunately, it is not as simple as only updating all events occurring AFTER the current time, as this would add work (and thus make it less marketable) to the people who do the uploading. Basically, they make the schedule on one program, export it to a CSV, and then upload it on a web page for all of the webapps that need that data. So it is simply much easier for them (and less stressful for everyone involved) to do the same routine every time of exporting the entire week and uploading it.

So my biggest concern is to make the upload script as smart as possible on both ends. It doesn't get bloated trying to find the changes, it can find the changes no matter the input AND none of the data that is unchanged risks getting re-keyed.

Here's a related question:

Suppose Joe User was schedule to wash dishes from 7:00 PM to 8:00 PM, but the new
data has him working 6:45 PM to 8:30 PM.  Has the shift been changed? Or has the old
one been deleted and a new one added?

And another one:

Say Jane was schedule to work 1:00 PM to 3:00 PM, but now everyone has a mandatory
staff meeting at 2:00 to 3:00. Has she lost one shift and gained two? Or has one
shift changed and she gained one?

I'm really interested in knowing how this kind of data is typically handled/approached, more than specific answers to the above.

Again, thank you.

A: 

If you have a unique key on one of the fields, you can use:

LOAD DATA LOCAL INFILE '/path/to/data.csv' REPLACE INTO TABLE table_name
Eran Galperin
I'd recommend checking if LOAD DATA (...) REPLACE uses the same semantics as MySQL 'REPLACE' command. 'REPLACE' performs 'DELETE' and then 'INSERT', but it causes 'ON DELETE' triggers to be called, and foregin keys checks to be performed, and 'ON DELETE CASCADE' removes foreign keys...
Abgan
It acts exactly the same.
Eran Galperin
A: 

Right now, the system I'm using checks to see if the data for that week is already there, and if it is, pulls all of that data from the DB, a script finds the differences and sends them out, and after all of this, the data the old data is deleted and replaced with the new data.

So your script knows the differences, right? And you don't want to use some extra extra tools, apart from your script and MySQL, right?

I'm quite convinced that MySQL doesn't offer any 'diff' tool by itself, so the best you can achieve is making new CSV file for updates only. I mean - it should contain only changed rows. Updating would be quicker, and all changed data would be easily available.

Abgan