views:

74

answers:

4

I have a table which contains sales order data (order number, product number, sales price, etc.).

However, the table is littered with corrections and various other invalid data. One of the main issues is that corrections were entered by adding a new row with a negative total equal to the amount of a previous order. The sales people were not always thorough and often gave a new order number, or didn't even list the product number in the correction.

I would like to delete all rows with a negative total, along with their matching (or any other with the same total) positive total row.

My first thought was to simply delete all negative total rows and any positive rows that have the opposite total of them. However, since multiple positive orders exist for many negative orders, this leads to lots of mistakenly deleted positive rows.

How can I delete all rows with a negative total, along with one row for each that has the inverse total?

+1  A: 

depending on how much data there is, I would just do it the brute force way.

select all the negative total rows into a temp table

use a cursor to go through each row, then query the database for a single match (using maybe max() on a timestamp, order number, or whatever primary key you might have. Delete that one "matching" row.

then delete all the negative rows

No doubt you can use a subquery and do it in one statement, but by the time I figured it out and tested it, I would have the job done using the above :)

MikeW
A: 

What is the shared identifier that links the 2 rows? Without this, you can't because you have nothing to link rows

Anyway, it would be something like

DELETE MyTable
WHERE EXISTS (
    SELECT * FROM MyTable M2
    GROUP BY M2.LinkID
    HAVING SUM(M2.ValueCol) < 0 AND MyTable.KeyCol = M2.KeyCol
    )
gbn
A: 

I'd run the inner SELECT, without the wrapping DELETE once, to see that the data looks ok, before executing, but I'm pretty sure this'd be fine

DELETE FROM
   orders
WHERE
   orderID IN (
       SELECT
          orderID
       FROM (
          SELECT 
             MIN(orderID) orderID, total
          FROM
             orders
          WHERE
             total IN (
                SELECT
                   total * -1
                FROM
                   orders
                WHERE
                   total < 0
             )
          GROUP BY
             total
      )derived
   )

DELETE FROM
    orders
WHERE
    total < 0
David Hedlund
A: 

Data cleanup tasks are painful no matter what. From what you've described, there is not enough information to fully automate this task. This is typical for data cleanup.

First you need to have a talk with your immediate manager and let him know the magnitude of the problem. It's not your fault the data is all screwed up, and it will take time to fix it without losing any valid information and without interrupting the sales operations.

The most important tip about data cleanup is that it's more trouble than it's worth to try to automate fully. Your strategy should be to reduce the problem by taking care of the easy cases, until you can do the remainder manually. There will always be complex edge cases, and trying to handle them all with clever SQL is an exercise in diminishing returns.

  1. Take care of the low-hanging fruit, where the negative "correction" has a valid order number, so you can make a strong correlation to the order it is intended to cancel.

  2. Create a correlation between the remaining negatives and the most recent single order rows with the same quantity. Use other columns to correlate them if you can, for instance if the correction is entered by the same salesperson who entered the original order.

  3. The next stage would be to delete negatives where the order number is valid, but it maps to multiple rows that sum up to the total value.

  4. Then start on matching negatives without order numbers to multiple rows that sum up to the value in the correction. This can be tricky to automate, but by this time the number of negatives might be few enough that you can do it manually, by eyeballing them one by one.

The other tip is that SQL Anywhere appears to have a multi-table DELETE syntax. I don't use SQL Anywhere, but I found this in the online docs:

Syntax

DELETE [ row-limitation ] 
  [ FROM ] [ owner.]table-expression
  [ FROM table-list [,...] ]
  [ WHERE search-condition ]
  [ ORDER BY { expression | integer } [ ASC | DESC ], ... ]
  [ OPTION( query-hint, ... ) ]

It looks like the first FROM clause lists the table you want to delete rows in. The second FROM clause allows you to do joins for purposes of restricting the rows. Since you're likely to be doing self-joins, remember that you need to give an alias (aka correlation name) in the first FROM to avoid ambiguity.

Bill Karwin