views:

49

answers:

2

Short descriptio:

Need hashing algorithm solution in php for large number of text values.


Long description.

PRODUCT_OWNER_TABLE
serial_number (auto_inc), product_name, owner_id

OWNER_TABLE
owner_id (auto_inc), owener_name

I need to maintain a database of 200000 unique products and their owners (AND all subsequent changes to ownership). Each product has one owner, but an owner may have MANY different products. Owner names are "Adam Smith", "John Reeves", etc, just text values (quite likely to be unicode as well).

I want to optimize the database design, so what i was thinking was, every week when i run this script, it fetchs the owner of a proudct, then checks against a table i suppose similar to PRODUCT_OWNER_TABLE, fetching the owner_id. It then looks up owner_id in OWNER_TABLE. If it matches, then its the same, so it moves on. The problem is when its different...

To optimize the database, i think i should be checking against the other "owner_name" entries in OWNER_TABLE to see if that value exists there. If it does, then i should use that owner_id. If it doesnt, then i should add another entry.

Note that there is nothing special about the "name". as long as i maintain the correct linkagaes AND make the OWNER_TABLE "read-only, append-new" type table - I should be able create a historical archive of ownership.

I need to do this check for 200000 entries, with i dont know how many unique owner names (~50000?). I think i need a hashing solution - the OWNER_TABLE wont be sorted, so search algos wont be optimal.

programming language is PHP. database is MYSQL.

A: 

What you propose is complex, difficult to maintain, redundant and will not improve the functionality nor performance.

This is a simple one to many relationship - so the product table should contain a foreign key referencing the owner table (which your data seems to have). But I can't imagine where you would ever have a scenario where you have a non-null value in the foreign key which does not reference a value in the owner table.

The problem is somewhat complicated by your use of surrogate keys (auto increment values). From your description, the real unique identifier of the owner is the owner_name - in which case, and since MySQL does not support constraints per se, is to create a unique index on owner.owner_name - and always try to insert a value there whenever you add a product or change the the owner of a product - you can specify that the query ignores duplicates - then if the affected_rows returned by this query is 1, fetch the insert_id, otherwise poll the key already stored.

C.

symcbean
A: 

+1 200000 records is not so big MySQL can handle much more. IMHO the only design to have here is the simplest and most efficient : one-to-many relationship with indexes on key (both primary on owner table and foreign on product table).

If your optimization aims to get results faster or reduce server load, and if your records changes or are deleted/reinserted you can try OPTIMIZE

OPTIMIZE TABLE `Owner`;
OPTIMIZE TABLE `Product`;

If you get weird sorting or wrong results, your database is corrupt. This do not happen on regular MySQL on fully functionnal system, but you can rebuild the whole tables.

(Links are provided for mysql 5.0, adjust to get your version documentation)

Benoit