views:

113

answers:

4

Hello people. Here is the CASE

I have an sql table consisting of just one data column, consisting of some group names. Here how the table looks:

OPEN SYSTEMS SUB GR 
OPEN SYSTEMS SUB GR (GM/BTIB(1111)/BTITDBL(2222)/BTVY(4444)/ACSVTYSAG
INFRASTRUCTURE SOFT SUB GR 
INFRASTRUCTURE SOFT SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTUGBL(3333)/BTAUSGAG
MAIN SERVER ONLİNE SYS SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTSY(5555)_ANBOSAG
MAIN SERVER SUB GR 
MAIN SERVER SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTVY(4444)_ANBVTYSAG
XTM/YTM SUB GR 
XTM/YTM SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTAU(6666)_BTABAG
CARDS SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTKOU(7777)_BTBKAG
SYSTEMS DEV. SUB GR (GM_BTIB(1111)_BTSGBL(8888)_BTPB(9999)_BBASGAG
PERSONAL B. SUB GR 
PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG

And continues like that. As you see, some of the groups have duplicate names like

PERSONAL B. SUB GR 
PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG

What I am tring to do is, I want to update the "short versions of groupnames" with "longer versions of them". As an example PERSONAL B. SUB GR will be replaced with , PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG Like the table will look like the following after the update.

OPEN SYSTEMS SUB GR (GM/BTIB(1111)/BTITDBL(2222)/BTVY(4444)/ACSVTYSAG
OPEN SYSTEMS SUB GR (GM/BTIB(1111)/BTITDBL(2222)/BTVY(4444)/ACSVTYSAG
INFRASTRUCTURE SOFT SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTUGBL(3333)/BTAUSGAG
INFRASTRUCTURE SOFT SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTUGBL(3333)/BTAUSGAG
MAIN SERVER ONLİNE SYS SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTSY(5555)_ANBOSAG
MAIN SERVER SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTVY(4444)_ANBVTYSAG
MAIN SERVER SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTVY(4444)_ANBVTYSAG
XTM/YTM SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTAU(6666)_BTABAG
XTM/YTM SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTAU(6666)_BTABAG
CARDS SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTKOU(7777)_BTBKAG
SYSTEMS DEV. SUB GR (GM_BTIB(1111)_BTSGBL(8888)_BTPB(9999)_BBASGAG
PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG
PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG

My table consists of 9000 records, and I have an algorithm which does that, however it works slow , like about 3 minutes to complete the query, however what I am tring to do is something simple. I need a faster algorithm.

Thanks for helps.

A: 

(something like) this might help:

UPDATE grp SET
  name = l.name
FROM grp s, grp l
WHERE LEN(s.name) < LEN(l.name)
  AND s.name = LEFT(l.name, LEN(s.name))

although you're going to harm yourself with those duplicates...

just somebody
Didn't work, it lasts more than 1 minute and I stopped it.
stckvrflw
there's no way in hell a table with frigging 9000 records would take that long. you're not telling us something important.
just somebody
Here is how I used your codeSELECT s.IDFROM grp s, grp lWHERE LEN(s.GRUP) < LEN(l.GRUP) AND s.GRUP = LEFT(l.GRUP, LEN(s.GRUP))That took 1 min 4 seconds. Returned 385.323 id's for group names. If I will do it with UPDATE it takes much much more longer.
stckvrflw
how many rows are in the table?
just somebody
9005 rows and 7 columns.
stckvrflw
@just somebody: Not quite a cross join, but close enough. The group name field probably has multiple copies of both the short and long names for all the groups. Say there are for group A, there are 20 rows, 10 with the short name, 10 with the long name. That yields 100 rows from the join criteria for that one group name.
Shannon Severance
+1  A: 

This should be relatively simple to solve in a set based manner:

UPDATE a
SET  a.yourfield = b.yourfield 
FROM yourtable a 
INNER JOIN yourtable b on b.yourfield LIKE a.yourfield + '%' AND b.yourfield <> a.yourfield

That will pattern match but exclude a match to itself - it is less than ideal however and I am guessing after this process you will remove duplicates or something, since having a table with multiple duplicates and no other distinguishing values is somewhat bizarre.

Andrew
I tried your code with SELECT instead of UPDATE, and even this way it lasts more than 1 minute, and I had to stop it manuelly.
stckvrflw
You have indexed the field I take it?
Andrew
I didn't got what you mean. I used your code like that. SELECT a.IDFROM yourtable a INNER JOIN yourtable b on b.GRUP LIKE a.GRUP + '%' AND b.GRUP <> a.GRUPIt didn't stopped, and I stopped it at 1 minute 30 seconds manually, it returned 365.868 records and was continuing. Btw a normal select like SELECT * FROM myTable returns the table in 0 seconds. So I don't have problem with my database.
stckvrflw
Select * from yourtable probably is returning in 0 seconds since a table of 9000 rows will be sitting in memory and cached - especially given all the testing you have done - that does not categorically represent 'there is no problem with my database'.Tell us what indexes are on the table, both clustered and non-clustered and which fields are included.Also tell us whether each test is being started clean with DBCC DropCleanBuffers and DBCC FreeProcCache, or whether you are using a warmed up cache.
Andrew
I have 2 primary keys for the table, I have no idea about the things you say. I didn't know that there was caching in sql-server, and I am not sure how will I check about it. I read the definitons of dbcc dropcleanbuffers but I didn't get it clearly.
stckvrflw
SQL Server uses a data cache extensivly based on the memory available within the server, so given the data quantities you are most likely to have your entire table in memory at present and not on the disk - that will affect results.You can not have 2 primary keys - so you either mean a primary key that is a composite of two fields, or you have a primary key and a second Unique constraint being enforced.Will be best if you script the table out and the indexing and include it within the question, we can then partially duplicate your situation on our own servers.
Andrew
A: 

Here is my code, which lasts about 3 minutes (Increasing exponantially fast when grup names increasing). I can't use it either since I shouldn't keep database busy all this 3 minutes, and it doesn't have future since query execution time is increasing rapidly.

UPDATE MYTABLE
SET GRUP=ISNULL(C.BGrup,MYTABLE.Grup)

FROM 
MYTABLE
FULL OUTER JOIN
(
SELECT
AGRUP, BGRUP
FROM
(
    SELECT 
    DISTINCT GRUP AS AGRUP
    FROM 
    MYTABLE
    WHERE GRUP NOT LIKE '%(%(%)%(%)%(%)%'
) A
LEFT JOIN
(
    SELECT 
    DISTINCT 
    REPLACE(GRUP,'_','/') AS BGRUP
    FROM 
    MYTABLE
    WHERE GRUP LIKE '%(%(%)%(%)%(%)%'
) B
ON 

AGRUP = SUBSTRING(BGRUP,0,LEN(AGRUP)+2)

WHERE 
AGRUP <> '' AND 
BGRUP <> '' AND
) C
ON MYTABLE.Grup=C.AGrup
stckvrflw
I don't understand what "REPLACE(GRUP,'_','/') AS BGRUP" Is doing, since the example of desired output still had underscores, and not slashies. See the row for: XTM/YTM SUB GR. Also, third from the bottom, "BGRUP <> '' AND" Looks like a syntax error.
Shannon Severance
A: 

Using SQL Server 2000.

Test data:

create table #t (grup varchar(200))

insert into #t values ('OPEN SYSTEMS SUB GR')
insert into #t values ('OPEN SYSTEMS SUB GR (GM/BTIB(1111)/BTITDBL(2222)/BTVY(4444)/ACSVTYSAG')
insert into #t values ('INFRASTRUCTURE SOFT SUB GR')
insert into #t values ('INFRASTRUCTURE SOFT SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTUGBL(3333)/BTAUSGAG')
insert into #t values ('MAIN SERVER ONLİNE SYS SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTSY(5555)_ANBOSAG')
insert into #t values ('MAIN SERVER SUB GR')
insert into #t values ('MAIN SERVER SUB GR (GM_BTIB(1111)_BTITDBL(2222)_BTVY(4444)_ANBVTYSAG')
insert into #t values ('XTM/YTM SUB GR')
insert into #t values ('XTM/YTM SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTAU(6666)_BTABAG')
insert into #t values ('CARDS SUB GR (GM_BTIB(1111)_BTUGBL(3333)_BTKOU(7777)_BTBKAG')
insert into #t values ('SYSTEMS DEV. SUB GR (GM_BTIB(1111)_BTSGBL(8888)_BTPB(9999)_BBASGAG')
insert into #t values ('PERSONAL B. SUB GR')
insert into #t values ('PERSONAL B. SUB GR (GM/BTIB(1111)/BTUGBL(3333)/BTAU(6666)/BTBISAG')

insert into #t
select L.*
from #t L
cross join #t R

insert into #t
select L.*
from #t L
cross join #t R

Giving vver 33,000 test rows.

Step 1, build temp table to hold the conversion from short group names to long group names:

select substring(grup, 1, charindex(' (', grup)) as short_grup
    , grup as long_grup
into #Lookup
from #t
WHERE GRUP LIKE '%(%(%)%(%)%(%)%'
group by grup

Step 2, use the above temp table in an update statement:

update target
    set target.grup = source.long_grup
from #t target
inner join #Lookup source
on target.grup = source.short_grup

NOTE: I tried to use an inline view in single step, but it is taking much longer (Currently at 7+ minutes):

update target
    set target.grup = source.long_grup
from #t target
inner join (select substring(grup, 1, charindex(' (', grup)) as short_grup
        , grup as long_grup
    from #t
    WHERE GRUP LIKE '%(%(%)%(%)%(%)%'
    group by grup) source
on target.grup = source.short_grup

Other things to think about

  1. If you group names are stable, build a permenant table with the short to long name mapping, instead of creating a temp table each time.
  2. Indexes on the short name and grup may speed up the join. Test to see.
  3. Can the system putting in the short names be fixed?
Shannon Severance