views:

43

answers:

1

I have two SQL tables with data that I would like to compare. The tables have the following structures:

id, title,url

I want to compare the tables by title where if the strings are similar or equal it would be noted by writing the records to a file. Someone suggested not doing it in SQL and copying the data and using it in .net as lists to compare.

Any suggestions on how to do this?

a typical example for title:

Table 1 - KPTH Fox 44 via

Table 2 - KPTH Fox44

I tried using LIKE in sql with wildcard '%' but his only works for these type of examples and wont work on

ABC

MSabC

+2  A: 

This would be way easier to do it in the DB and then return the dataset of items that match. Assuming the two tables are identical and you are matching on just title your SQL would look like:

SELECT [id], [title], [url]
FROM [Table1] t1
JOIN [Table2] t2 ON t1.[title] = t2.[title]

Then iterate through each record in the result set and write whatever fields you want to a file.

If you want to just stick to SQL, you could run the query and then just save the outout to a file using a query tool for your DB like Management Studio or something.

EDIT: Based on the 'similar' requirement you could use the SOUNDEX command in SQL to compare if they sound similar. See MSDN for more info regarding the SOUNDEX command.

EDIT 2: To add more information with respect to you added requirements: In SQL this similar requirment might be difficult. You could parse the title into words and then store them into a temp table and see how many match up and then based on a threshhold to assign a match condition. It looks like your data also stores a URL so you could possibly use this as well and check to the base URL to see if they match first before doing the weighted title check. I am not totally clear what outcome you are looking for other than the title.

It's really not a very simple task that I think someone will be able to just give you an answer. I am sure a prof at a university is working on this problem though.

Kelsey
I think the difficulty in using a pure SQL solution such as this is that it only tests for equality. It doesn't apply as well to fuzzy similarity analysis.
kbrimington
@kbrimington I see your point. He could try the `SOUNDEX` command in SQL which might meet the 'similar' requirement. I have editted my answer to reflect it.
Kelsey
SOUNDEX...is this similar to COMPARE in vb.net.
vbNewbie
@vbNewbie read the MSDN link I posted on it, he it is again: http://msdn.microsoft.com/en-us/library/ms187384.aspx I don't think it is anything like `Compare` since it is a 'sounds like' approach to comparison.
Kelsey
@Kelsey: +1 for the `SOUNDEX` reference. To be honest, I'd never heard of it before. Reading the docs, I can think of a few uses for it at work right now. Thanks.
kbrimington
Looking at the examples added by the OP, it appears that a less-phonetic approach to comparison is necessary, but I'm leaving the upvote for the high-quality response.
kbrimington