tags:

views:

108

answers:

3

Hello, I have a big table with 300,000 records. This table has a integer value called "velocity" and it`s value is from 0 to 100. In the firsts records, the value is 0 and I want to remove. I want to remove from the query, the records where the velocity field repeats more than 10 times. For example:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 5 10 12 13 15 20 30 20 15 10 8 5 2 1 0 0 0 0 4 5 10 20...

[-------remove this-----------].......................................................................[---------] <- do not remove this

Thanks

A: 

I'd just rip through the records sequentially, with a variable sized window that expands and contracts to comprehend identical values. Whenever the size is >= 10 when the value changes, delete the rows using the primary keys.

You can put BEGIN TRAN and COMMIT TRAN at the beginning and end of the DELETE statements to make things reasonably efficient.

le dorfier
Note: Since the operation is dependent on record ordering, this isn't really a set-based problem. So using SQL is is going to be awkward. Probably better to use your procedural language and look at it as ano ordered list.
le dorfier
+3  A: 

The easiest way to do this is with a loop. You can write a stored procedure that iterates through the records, or you might do it outside of the database. I'd do it like that if this needs to be done once. If this is a continuous process, it's better to make sure that the extra data is just not inserted into the database in the first place.

Anyway, if you insist on doing this in pure SQL, without stored procedures with loops, you can use a query like this:

set @groupnum=0;

select 
  GroupNum,
  count(*) as RecsInGroup 
from
(
    select 
      t1.id as Id,
      t1.velocity as velocity1,
      t2.velocity as velocity2,  
      if(t1.velocity<>t2.velocity,@groupnum:=@groupnum+1,@groupnum) as GroupNum
    from 
      VelocityTable as t1
    join
      VelocityTable as t2  
    on
      t1.id=t2.id-1
) as groups
group by
  GroupNum  
having RecsInGroup>10

What happens here?

Step 1

The inner query just selects all records in your table, but splits the data in sequential groups.

So, using your example, it does this:

velocity : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 5 10 12 13 15 20 30 20 15 10  8  5  2  1  0  0  0  0  4  5 10 20
Groupnum : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 18 18 18 19 20 21 22

It does that by joining the table to itself, by linking subsequent records in the table. Every time the left and right velocity are different, the GroupNum is increased. Otherwise it's left unchanged.

Step 2

The result if the query is wrapped in an outer query, and grouped by GroupNum. Again, using your example it would result in this:

GroupNum,RecsInGroup
0,15 // !!
1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1
10,1
11,1
12,1
13,1
14,1
15,1
16,1
17,1
18,4 // !!
19,1
20,1
21,1

By Adding the having RecsInGroup>10 clause, the result becomes this:

GroupNum,RecsInGroup
0,15

Now, with this list of GroupNum's you can delete records.

Step 3

With the query above you have:

  1. A list of all your records, with an added GroupNum column.
  2. The list of GroupNum's that need to be removed.

Deleting the records should be easy at this point.

Wouter van Nifterick
A: 

Hello Wouter, thank you very much. I'm allmost there, but i tried it with a mySQL View as table source and it's not working (unkown table xxx). I can't use the whole table because it's have more than 19 millions records, I just need the record from a specific day, vehicle plate and city.