ansaurus

Question

A question about desining a faster algorithm in T-SQL for a CASE that I will be telling.

Answer 1

A:

(something like) this might help:

UPDATE grp SET
  name = l.name
FROM grp s, grp l
WHERE LEN(s.name) < LEN(l.name)
  AND s.name = LEFT(l.name, LEN(s.name))

although you're going to harm yourself with those duplicates...

just somebody 2009-12-02 10:22:09

Didn't work, it lasts more than 1 minute and I stopped it.

stckvrflw 2009-12-02 11:36:39

there's no way in hell a table with frigging 9000 records would take that long. you're not telling us something important.

just somebody 2009-12-02 11:44:32

Here is how I used your codeSELECT s.IDFROM grp s, grp lWHERE LEN(s.GRUP) < LEN(l.GRUP) AND s.GRUP = LEFT(l.GRUP, LEN(s.GRUP))That took 1 min 4 seconds. Returned 385.323 id's for group names. If I will do it with UPDATE it takes much much more longer.

stckvrflw 2009-12-02 12:11:45

how many rows are in the table?

just somebody 2009-12-02 12:15:34

9005 rows and 7 columns.

stckvrflw 2009-12-02 12:19:46

@just somebody: Not quite a cross join, but close enough. The group name field probably has multiple copies of both the short and long names for all the groups. Say there are for group A, there are 20 rows, 10 with the short name, 10 with the long name. That yields 100 rows from the join criteria for that one group name.

Shannon Severance 2009-12-05 00:43:53

Answer 2

+1 A:

This should be relatively simple to solve in a set based manner:

UPDATE a
SET  a.yourfield = b.yourfield 
FROM yourtable a 
INNER JOIN yourtable b on b.yourfield LIKE a.yourfield + '%' AND b.yourfield <> a.yourfield

That will pattern match but exclude a match to itself - it is less than ideal however and I am guessing after this process you will remove duplicates or something, since having a table with multiple duplicates and no other distinguishing values is somewhat bizarre.

Andrew 2009-12-02 10:26:12

I tried your code with SELECT instead of UPDATE, and even this way it lasts more than 1 minute, and I had to stop it manuelly.

stckvrflw 2009-12-02 11:35:43

You have indexed the field I take it?

Andrew 2009-12-02 11:58:59

I didn't got what you mean. I used your code like that. SELECT a.IDFROM yourtable a INNER JOIN yourtable b on b.GRUP LIKE a.GRUP + '%' AND b.GRUP <> a.GRUPIt didn't stopped, and I stopped it at 1 minute 30 seconds manually, it returned 365.868 records and was continuing. Btw a normal select like SELECT * FROM myTable returns the table in 0 seconds. So I don't have problem with my database.

stckvrflw 2009-12-02 12:19:06

Select * from yourtable probably is returning in 0 seconds since a table of 9000 rows will be sitting in memory and cached - especially given all the testing you have done - that does not categorically represent 'there is no problem with my database'.Tell us what indexes are on the table, both clustered and non-clustered and which fields are included.Also tell us whether each test is being started clean with DBCC DropCleanBuffers and DBCC FreeProcCache, or whether you are using a warmed up cache.

Andrew 2009-12-02 12:27:15

I have 2 primary keys for the table, I have no idea about the things you say. I didn't know that there was caching in sql-server, and I am not sure how will I check about it. I read the definitons of dbcc dropcleanbuffers but I didn't get it clearly.

stckvrflw 2009-12-02 13:16:27

SQL Server uses a data cache extensivly based on the memory available within the server, so given the data quantities you are most likely to have your entire table in memory at present and not on the disk - that will affect results.You can not have 2 primary keys - so you either mean a primary key that is a composite of two fields, or you have a primary key and a second Unique constraint being enforced.Will be best if you script the table out and the indexing and include it within the question, we can then partially duplicate your situation on our own servers.

Andrew 2009-12-02 13:31:22

Answer 3

A:

Here is my code, which lasts about 3 minutes (Increasing exponantially fast when grup names increasing). I can't use it either since I shouldn't keep database busy all this 3 minutes, and it doesn't have future since query execution time is increasing rapidly.

UPDATE MYTABLE
SET GRUP=ISNULL(C.BGrup,MYTABLE.Grup)

FROM 
MYTABLE
FULL OUTER JOIN
(
SELECT
AGRUP, BGRUP
FROM
(
    SELECT 
    DISTINCT GRUP AS AGRUP
    FROM 
    MYTABLE
    WHERE GRUP NOT LIKE '%(%(%)%(%)%(%)%'
) A
LEFT JOIN
(
    SELECT 
    DISTINCT 
    REPLACE(GRUP,'_','/') AS BGRUP
    FROM 
    MYTABLE
    WHERE GRUP LIKE '%(%(%)%(%)%(%)%'
) B
ON 

AGRUP = SUBSTRING(BGRUP,0,LEN(AGRUP)+2)

WHERE 
AGRUP <> '' AND 
BGRUP <> '' AND
) C
ON MYTABLE.Grup=C.AGrup

stckvrflw 2009-12-02 11:39:40

I don't understand what "REPLACE(GRUP,'_','/') AS BGRUP" Is doing, since the example of desired output still had underscores, and not slashies. See the row for: XTM/YTM SUB GR. Also, third from the bottom, "BGRUP <> '' AND" Looks like a syntax error.

Shannon Severance 2009-12-05 00:52:56

Answer 4

A:

Using SQL Server 2000.

Test data:

create table #t (grup varchar(200))

insert into #t values ('OPEN SYSTEMS SUB GR')
insert into #t values ('OPEN SYSTEMS SUB GR (GM/BTIB(1111)/BTITDBL(2222)/BTVY(4444)/ACSVTYSAG')
insert into #t values ('INFRASTRUCTURE SOFT SUB GR')
insert into #t values ('INFRASTRUCTURE SOFT SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTUGBL(3333)/BTAUSGAG')
insert into #t values ('MAIN SERVER ONLİNE SYS SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTSY(5555)_ANBOSAG')
insert into #t values ('MAIN SERVER SUB GR')
insert into #t values ('MAIN SERVER SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTVY(4444)_ANBVTYSAG')
insert into #t values ('XTM/YTM SUB GR')
insert into #t values ('XTM/YTM SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTAU(6666)_BTABAG')
insert into #t values ('CARDS SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTKOU(7777)_BTBKAG')
insert into #t values ('SYSTEMS DEV. SUB GR (GM_BTIB(1111)_BTSGBL(8888)_BTPB(9999)_BBASGAG')
insert into #t values ('PERSONAL B. SUB GR')
insert into #t values ('PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG')

insert into #t
select L.*
from #t L
cross join #t R

insert into #t
select L.*
from #t L
cross join #t R

Giving vver 33,000 test rows.

Step 1, build temp table to hold the conversion from short group names to long group names:

select substring(grup, 1, charindex(' (', grup)) as short_grup
    , grup as long_grup
into #Lookup
from #t
WHERE GRUP LIKE '%(%(%)%(%)%(%)%'
group by grup

Step 2, use the above temp table in an update statement:

update target
    set target.grup = source.long_grup
from #t target
inner join #Lookup source
on target.grup = source.short_grup

NOTE: I tried to use an inline view in single step, but it is taking much longer (Currently at 7+ minutes):

update target
    set target.grup = source.long_grup
from #t target
inner join (select substring(grup, 1, charindex(' (', grup)) as short_grup
        , grup as long_grup
    from #t
    WHERE GRUP LIKE '%(%(%)%(%)%(%)%'
    group by grup) source
on target.grup = source.short_grup

Other things to think about

If you group names are stable, build a permenant table with the short to long name mapping, instead of creating a temp table each time.
Indexes on the short name and grup may speed up the join. Test to see.
Can the system putting in the short names be fixed?

Shannon Severance 2009-12-05 01:18:24

ansaurus

tags:

views:

answers:

A question about desining a faster algorithm in T-SQL for a CASE that I will be telling.

related questions