views:

382

answers:

8

Hi All,

I need SQL code to solve the tables combination problem, described on below:

Table old data: table old

    name     version    status    lastupdate      ID
    A        0.1        on        6/8/2010        1
    B        0.1        on        6/8/2010        2
    C        0.1        on        6/8/2010        3
    D        0.1        on        6/8/2010        4
    E        0.1        on        6/8/2010        5
    F        0.1        on        6/8/2010        6
    G        0.1        on        6/8/2010        7

Table new data: table new

    name     version    status    lastupdate     ID         
    A        0.1        on        6/18/2010                
                                                           #B entry deleted
    C        0.3        on        6/18/2010                #version_updated
    C1       0.1        on        6/18/2010                #new_added
    D        0.1        on        6/18/2010                
    E        0.1        off       6/18/2010                #status_updated
    F        0.1        on        6/18/2010                
    G        0.1        on        6/18/2010                
    H        0.1        on        6/18/2010                #new_added
    H1       0.1        on        6/18/2010                #new_added

the difference of new data and old date:

B entry deleted

C entry version updated

E entry status updated

C1/H/H1 entry new added

What I want is always keeping the ID - name mapping relationship in old data table no matter how data changed later, a.k.a the name always has an unique ID number bind with it.

If entry has update, then update the data, if entry is new added, insert to the table then give a new assigned unique ID. If the entry was deleted, delete the entry and do not reuse that ID later.

However, I can only use SQL with simple select or update statement then it may too hard for me to write such code, then I hope someone with expertise can give direction, no details needed on the different of SQL variant, a standard sql code as sample is enough.

Thanks in advance!

Rgs

KC

======== I listed my draft sql here, but not sure if it works, some one with expertise pls comment, thanks!

1.duplicate old table as tmp for store updates

create table tmp as select * from old

2.update into tmp where the "name" is same in old and new table

update tmp where name in (select name from new)

3.insert different "name" (old vs new) into tmp and assign new ID

insert into tmp (name version status lastupdate ID) set idvar = max(select max(id) from tmp) + 1 select * from (select new.name new.version new.status new.lastupdate new.ID from old, new where old.name <> new.name)

4. delete the deleted entries from tmp table (such as B)

delete from tmp where (select ???)

+1  A: 

Let me start from the end:

In #4 you would delete all rows in tmp; what you wanted to say there is WHERE tmp.name NOT IN (SELECT name FROM new); similarly #3 is not correct syntax, but if it was it would try to insert all rows.

Regarding #2, why not use auto increment on the ID?

Regarding #1, if your tmp table is the same as new the queries #2-#4 make no sense, unless you change (update, insert, delete) new table in some way.

But (!), if you do update the table new and it has an auto increment field on ID and if you are properly updating the table (using ID) from the application then your whole procedure is unnecessary (!).

So, the important thing is that you should not design the system to work like above.

To get the concept of updating data in the database from the application side take a look at examples here (php/mysql).

Also, to get the syntax correct on your queries go through the basic version of SET, INSERT, DELETE and SELECT commands (no way around this).

Unreason
thanks for your correction, Unreason.For #2, if I use auto increment in #2, if someone inserted entries which is do not expected, it will also get an ID for it, it will be hard to find out where is them later, then, to play safe and simple, I want data be validated before the ID be assigned - because those data be updated/inserted without ID, then it will be much easier to add verify mechanism somehow after I get the main idea how to play most SQL for this story.
K. C
i agree that you should not accept the data that is not expected in your database, but again, you should check the data before inserting it. you can do that from application (in application code) or from database (checks and other integrity rules, plus triggers); or best at both places - at application layer for performance and at database for peace of mind (bit more maintenance though). this has nothing to do with having temporary table. it is very, very likely that the temporary table in your case is not only redundant, but will lead to all sort of complexities.
Unreason
at the end you will basically end up synchronising two tables and that can be either very complicated in true multi user with error handling scenarios (especially if atomicity, consistency, isolation and durability are important) or, on the other had (and this is most likely in your case) simply not necessary.
Unreason
Unreason,thanks, your comment and sharing are very useful for designing the application. And so far the project is just in modeling phase try to afford one quick and dry solution able to work to see where is the problem and where can be improved later - I am trying to collect the pieces of the knowledge points, because SQL is on of my weak point, to be frank, i just know little about what SQL can do and no little about the grammar details, so I wonder if you may help me to put the grammar to right if you think it is ok I can learn form the right code.
K. C
Regarding the tmp table, because the system is not so multi-users faced also it is not for a fully automatic system then I think it is not need to involve too much complexity just a simple interface to few users with exclusive lock (even just given a simple operation entry can touch the database at the same time),I can ensure no one can touch the old/new table when tmp table exists.So when everything turns ok in an update cycle, I will rename the “tmp” table to the “old” replace existing one. That will be the data I needed and the point can enter next update cycle.
K. C
As for learning SQL lots of good grammar and samples at http://dev.mysql.com/doc/refman/5.1/en/tutorial.html (as stated before there is no way around this). As for insisting on tmp/old, if you followed examples at the link provided you should have either realized that you don't need it (it WILL be more complex) or should have come up with valid reasons to use such approach (for now I have not hear any). Renaming/replacing tables during a normal update cycle is like changing an engine of your car at every gas station.
Unreason
+1  A: 

You never mentioned what DBMS you are using but if you are using SQL Server, one really good one is the SQL MERGE statement. See: http://www.mssqltips.com/tip.asp?tip=1704

The MERGE statement basically works as separate insert, update, and delete statements all within the same statement. You specify a "Source" record set and a "Target" table, and the join between the two. You then specify the type of data modification that is to occur when the records between the two data are matched or are not matched. MERGE is very useful, especially when it comes to loading data warehouse tables, which can be very large and require specific actions to be taken when rows are or are not present.

Example:

MERGE Products AS TARGET
USING UpdatedProducts AS SOURCE 
ON (TARGET.ProductID = SOURCE.ProductID) 
--When records are matched, update 
--the records if there is any change
WHEN MATCHED AND TARGET.ProductName <> SOURCE.ProductName 
OR TARGET.Rate <> SOURCE.Rate THEN 
UPDATE SET TARGET.ProductName = SOURCE.ProductName, 
TARGET.Rate = SOURCE.Rate 
--When no records are matched, insert
--the incoming records from source
--table to target table
WHEN NOT MATCHED BY TARGET THEN 
INSERT (ProductID, ProductName, Rate) 
VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
--When there is a row that exists in target table and
--same record does not exist in source table
--then delete this record from target table
WHEN NOT MATCHED BY SOURCE THEN 
DELETE
--$action specifies a column of type nvarchar(10) 
--in the OUTPUT clause that returns one of three 
--values for each row: 'INSERT', 'UPDATE', or 'DELETE', 
--according to the action that was performed on that row
OUTPUT $action, 
DELETED.ProductID AS TargetProductID, 
DELETED.ProductName AS TargetProductName, 
DELETED.Rate AS TargetRate, 
INSERTED.ProductID AS SourceProductID, 
INSERTED.ProductName AS SourceProductName, 
INSERTED.Rate AS SourceRate; 
SELECT @@ROWCOUNT;
GO
Nate Zaugg
i believe the target table won't have the ID. I think what OP has is a table (table old) with ID and associated data. OP then gets the new set of associated data. OP wants to make these changes in the original table by matching on the name column.Ofcourse I could be reading the whole thing wrong :-)
potatopeelings
+1  A: 

Note - if you are concerned about performance you can skip this whole answer :-)

If you can redesign have 2 tables - one with the data and other with the name - ID linkage. Something like

table_original

name     version    status    lastupdate
A        0.1        on        6/8/2010
B        0.1        on        6/8/2010
C        0.1        on        6/8/2010
D        0.1        on        6/8/2010
E        0.1        on        6/8/2010
F        0.1        on        6/8/2010
G        0.1        on        6/8/2010

and name_id

name     ID 
A        1 
B        2 
C        3 
D        4 
E        5 
F        6 
G        7

When you get the table_new with the new set of data

  1. TRUNCATE table_original
  2. INSERT INTO name_id (names from table_new not in name_id)
  3. copy table_new to table_original

Note : I think there's a bit of ambiguity about the deletion here

If the entry was deleted, delete the entry and do not reuse that ID later.

If name A gets deleted, and it turns up again in a later set of updates do you want to a. reuse the original ID tagged to A, or b. generate a new ID?

If it's b. you need a column Deleted? in name_id and a last step

4 . set Deleted? = Y where name not in table_original

and 2. would exclude Deleted? = Y records.

You could also do the same thing without the name_id table based on the logic that the only thing you need from table_old is the name - ID links. Everything else you need is in table_new,

potatopeelings
thanks, could you take time read my comment below when u free?
K. C
A: 

Thanks all for your warm-hearted help, let me show my problem clearly: I want to write a plug-in like module with language PYTHON, during the runtime one can insert plug-in and run it quickly and also can disable/delete it, something like the firefox add-ons. Generally speaking, A plug-in system may have below basic properties:

name
version
active status 
other necessary information for invoking
UID           // it is most important for me : something like a hash to  
              // share with other program to indicate this plug-in.  
              // For some reason, the "name" string is not suitable for 
              // my system although it is also unique, then I think an int type 
              // is better one because it is simple enough.

Here comes my problem: How to assign one unique UID for those plug-ins and keeping them be tracked in sqlite database no matter how they will be disorganized later - delete, upgraded, deactivate, disordered - one name will have one unique int number bind with it.

The system may be splited later, for example, there will be 2 instances running on 2 standalone servers, if someday later I want to migrate them to one bigger server and the merge 2 plug-in table will be a problem if I use the AUTOINCREMENT as the UID - it will give me fewer chance to make things more clearly.

I know all of your suggestion is right for designing a database in a good manner, however, on my case I just need a sketch to figure out quickly, the database part is not the best important, ok, maybe the best important because without it the rest part can fault.

Also, this problem can be rooted here, I will very appreciated if someone can give a quick solution. http://stackoverflow.com/questions/3028200/how-can-i-assign-a-sequence-value-to-a-sqlite-field-when-the-field-uid-value-is-n

Rgs.

K. C
Could you use a new calculated column (<Servername> + Autoincrement) value as the unique ID? Note - you might want to incorporate this comment in the Question.
potatopeelings
thanks, but that may not better than a HASH string, for example: if i want to make a stat. of all the plugin status and send the message to another machine, using serial data for example, the unique id can be used like this:011110111111010101010the plugin status can be expressed using the 0/1 and the UID int number can be used as the position number(the slot number of this stream), something like a telecom protocol, that's why i use no name string and hash...
K. C
and , how to merge questions I have asked here? just delete duplicated one or there's better way?
K. C
A: 

a drafted approach, I have no idea if it works fine......

CREATE TRIGGER auto_next_id AFTER INSERT ON table FOR EACH ROW BEGIN UPDATE table SET uid = max(uid) + 1 ; END;

K. C
+1  A: 

This works in Informix and gives exactly the display you require. Same or similar should work in MySQL, one would think. The trick here is to get the union of all names into a temp table and left join on that so that the values from the other two can be compared.

SELECT DISTINCT name FROM old
UNION
SELECT DISTINCT name FROM new
INTO TEMP _tmp;

SELECT 
  CASE WHEN b.name IS NULL THEN ''
       ELSE aa.name
       END AS name, 
  CASE WHEN b.version IS NULL THEN ''
       WHEN a.version = b.version THEN a.version 
       ELSE b.version
       END AS version,
  CASE WHEN a.status = b.status THEN a.status 
       WHEN b.status IS NULL THEN ''
       ELSE b.status
       END AS status,
  CASE WHEN a.lastupdate = b.lastupdate THEN a.lastupdate 
       WHEN b.lastupdate IS NULL THEN null
       ELSE b.lastupdate
       END AS lastupdate,
  CASE WHEN a.name IS NULL THEN '#new_added'
       WHEN b.name IS NULL THEN '#' || aa.name || ' entry deleted'
       WHEN a.version  b.version THEN '#version_updated'
       WHEN a.status  b.status THEN '#status_updated'
       ELSE ''
  END AS change
  FROM _tmp aa
  LEFT JOIN old a
         ON a.name = aa.name
  LEFT JOIN new b
         ON b.name = aa.name;
Wolfe
A: 

If I understood well what you need based on the comments in the two tables, I think you can simplify a lot your problem if you don't merge or update the old table because what you need is table new with the IDs in table old when they exist and new IDs when they do not exist, right?

New records: table new has the new records already - OK (but they need a new ID) Deleted Records: they are not in table new - OK Updated Records: already updated in table new - OK (need to copy ID from table old) Unmodified records: already in table new - OK (need to copy ID from table old)

So the only thing you need to do is to: (a) copy the IDs from table old to table new when they exist (b) create new IDs in table new when they do not exist in table old (c) copy table new to table old.

(a) UPDATE new SET ID = IFNULL((SELECT ID FROM old WHERE new.name = old.name),0);

(b) UPDATE new SET ID = FUNCTION_TO GENERATE_ID(new.name) WHERE ID = 0;

(c) Drop table old; CREATE TABLE old (select * from new);

As I don't know which SQL database you are using, in (b) you can use an sql function to generate the unique id depending on the database. With SQL Server, newid(), With postgresql (not too old versions), now() seems a good choice as its precision looks sufficient (but not in other databases as MySQL for example as I think the precision is limited to seconds)

Edit: Sorry, I hadn't seen you're using sqlite and python. In this case you can use str(uuid.uuid4()) function (uuid module) in python to generate the uuid and fill the ID in new table where ID = 0 in step (b). This way you'll be able to join 2 independent databases if needed without conflicts on the IDs.

laurent-rpnet
A: 

Why don't you use a UUID for this? Generate it once for a plug-in, and incorporate/keep it into the plug-in, not into the DB. Now that you mention python, here's how to generate it:

import uuid
UID = str(uuid.uuid4()) # this will yield new UUID string

Sure it does not guarantee global uniqueness, but chances you get the same string in your project is pretty low.

t7ko