tags:

views:

210

answers:

2

Hi!

I have two tables which looks like this:

News: (ID, Title, TagID)
Tags: (ID, Tag)

Each news can only have one tag. What is the most effective way to handle inserts to the news table? The Tags table has like 50 000 rows.

I'm only doing bulk inserts of approx. 300 news at a time, around 2 times per hour. I assume that i need some in-memory cache for the tags?

If the tag is not in the tags table, i need to insert it and set TagID to the newly inserted id.

Hope you'll get the idea!

+1  A: 

What version of SQL Server are you using in the background?

If you're using SQL Server 2008, I would recommend bulk-loading the tags and news for each day into a temporary working table, and then using the MERGE statement to update the actual Tags and News table from those working tables. I'd use the C# "SqlBulkCopy" class for that.

MERGE allows you to easily insert only those items that have changed, and possibly update those that already exist, all in one single, handy SQL statement.

If you're on SQL Server 2005 or below, you can do basically the same, but you'll have to write some code (C# or T-SQL) to manually check what needs to be inserted from your temp bulkload tables, and what is already present.

Marc

marc_s
A: 

I presume with each news item you'll get a list of strings that are the supposed "tags". From the structure you've given, you can only have one tag on each news item? That seems unusual, but the below applies anyway.

If your Tags table has an index on it, the searches will be really fast, and the database will take care of the caching anyway, so don't worry about the caching. You'll be amazed how much the database can speed things up when you have indexes in the right place

Do a select from Tags where Tag = whatever1 (do this for each tag), each time if no rows returned insert it, otherwise use the id you've found to do it. Run the proc on each INSERT.

Ilya Tchivilev
Thanks, I've tested this and it works great.
alexn