ansaurus

Question

Answer 1

A:

Assuming you are using MySQL, the only SQL I know anything about:

Is it true that the export of your customer's CMS always contains all of his current customer data? If it is true, then yes it is best imo to drop or truncate the 'customers' table; that is, to just throw away yesterday's customer table and reconstruct it today from the beginning.

But you can't use 'insert': it will take ~28 hours per day to insert thousands of customer rows. So forget about 'insert'.

Instead, add rows into 'customers' with 'load data local infile': first write a temp disk file 'cust_data.txt' of all the customer data, with column data separated somehow (perhaps by commas), and then say something like:

load data local infile 'cust_data.txt' replace into table customers fields terminated by ',' lines terminated by '\n';

Can you structure the query such that you can use your client's output file directly, without first staging it into 'cust_data.txt'? That would be the answer to a maiden's prayer.

It should be fast enough for you: you will be amazed!

ref: http://dev.mysql.com/doc/refman/5.0/en/load-data.html

Pete Wilson 2010-06-23 10:02:35

But there will be available few minutes uncomplete data in our webapp just after TRUNCATE... :(And I don't believe, that replacing normal SQL queries of importing SQL query can save so much time... Or maybe the process itself can, but we have to call much URLs and fetch XML/JSON from them and this really takes much time. My priority is also the integrity and completeness of data.

Radek Šimko 2010-06-23 10:56:33

Truncating the table each time means that you risk losing data - not what you want to do if its intended for backup purposes.

symcbean 2010-06-23 11:37:44

Just asking, not arguing: how/where does truncating the table each time risk data loss? I have to believe that the "current" state of customer info is always around somewhere, intact and complete, at Radek's client; and is therefore available to reconstruct the 'customer' table any time. What am I missing? Thanks!

Pete Wilson 2010-06-23 13:09:47

Answer 2

+1 A:

I'm interested in the best solution in the way of performace

If its mysql at the client, use mysql replication - the client as the master and your end as the slave. You can either use a direct feed (you'd probably want to run this across a VPN) or in disconnected mode (they send you the bin logs to roll forward).

Our customer is providing us an URLs of XML/JSON exports of his clients informations from his CMS

This is a really dumb idea - and sounds like you're trying to make the solution fit the problem (which it doesn't). HTTP is not the medium for transferring large data files across the internet. It also means that the remote server must do rather a lot of work just to make the data available (assuming it can even identify what data needs to be replicated - and as you point out, that is currently failing to work for deleted records). The latter point is true regardless of the network protocol.

You caertainly can't copy large amounts of data directly across at a lower level in the stack than the database (e.g. trying to use rsync to replicate data files) because the local mirror will nearly always be inconsistent.

C.

symcbean 2010-06-23 11:36:14

Answer 3

A:

If your customer can export data as csv file, you can use SQL Data Examiner http://www.sqlaccessories.com/SQL_Data_Examiner to update records in the target database (insert/update/delete) using csv file as source.

SQLDev 2010-06-23 16:17:16

ansaurus

tags:

views:

answers:

Remote permanent synchronization

related questions