views:

875

answers:

4

In my database schema I have an entity that is identified. The identifier can be reused and thus there is a one-to-many relation with the entity. Example: A person can have a nickname. Nicknames are not unique and can be shared amongst many people. So the schema might look like:

PERSON
id
name
nickname_id

NICKNAME
id
name

The issue is that when inserting a new person, I have to first query NICKNAME to see if the nickname exists. If it doesn't then I have to create a row in NICKNAME. When inserting many persons, this can be slow as each person insertion results in a query to NICKNAME.

I could optimize large insertions by first querying Nickname for all the nicknames. JPA query language:

SELECT n FROM NICKNAME n WHERE name in ('Krusty', 'Doppy', 'Flash', etc)

And then create the new nicknames as necessary, followed by setting nickname_id on the persons.

This complicates the software a bit as it has to temporarily store nicknames in memory. Furthermore, some databases have a limit on the parameters of the IN clause (SQL Server is 2100 or so) so I have perform multiple queries.

I'm curious how this issue is dealt with by others. More specifically, when a database is normalized and an entity has a relationship with another, inserting a new entity basically results in having to check the other entity. For large inserts this can be slow unless the operation is lifted into the code domain. Is there someway to auto insert the related table rows?

FYI I'm using Hibernate's implementation of JPA

+1  A: 

I'm not sure if an ORM can handle this, but in straight SQL you could:

  1. Create a table of name/nickname pairs,
  2. INSERT INTO NicknameTable SELECT Nickname FROM temp WHERE Nickname NOT IN (SELECT Nickname FROM NicknameTable)
  3. Insert into main table knowing the Nickname exists.

In your example, you can just have a NULLable nickname column withoout another table, unless a person can have more than one nickname.

le dorfier
This is certainly how I would handle this except I'd use a left join instead of a not in statement as they tend to perform better (at least in SQL Server).
HLGEM
A: 

Truthfully? I'd make nickname a varchar column in the Person table, and forget about the Nickname table. Nickname is an attribute of a person, not a separate entity.

Is this a simplified example, and your 'identifiers' really do benefit from the entity-relationships?

edit: Okay, understood this is just an artificial example. The question is a good one, because it comes up often enough.

Standard SQL supports a form of INSERT statement with an optional "...ON DUPLICATE KEY UPDATE..." clause. Support for this syntax varies by database brand. If you add a UNIQUE constraint to the identifier name in the Nickname table, a duplicate entry will invoke the UPDATE part of the clause (you can do a dummy update, instead of changing anything).

CREATE TABLE Nickname (
  id SERIAL PRIMARY KEY,
  name VARCHAR(20) UNIQUE
);

INSERT INTO Nickname (name) VALUES ("Bill")
  ON DUPLICATE KEY UPDATE name = name;
Bill Karwin
My person-nickname schema was just an example. My question is how to insert large quantity of data that has a relationship with another entity (table).
Steve Kuo
A: 
INSERT INTO Person(Name, NicknameID)
    VALUES(:name, (SELECT id FROM Nickname WHERE Name = :nickname))

If the INSERT fails because the nickname doesn't exist, then insert the nickname and then the person record.

I'm assuming that :name and :nickname identify host variables containing the user's name and nickname - and the that person.id column will be assigned a value automatically when it is omitted from the SQL. Adapt to suit your circumstances.

If you think most nicknames will in fact be unique, you could simply attempt to insert the nickname unconditionally, but ignore the error that occurs if the nickname already exists.

Jonathan Leffler
A: 

Alternatively, perhaps a 'MERGE' statement could help? It offers the option of inserting a new value or updating an existng value. Syntax and suport varies by DB, but possibly more common than the 'ON DUPLICATE' option.

andora