tags:

views:

804

answers:

7

I have a table, with 2 important columns DocEntry, WebId

Sample data is like

DocEntry  WebId
1         S001
2         S002
3         S003
4         S005

Now as we can notice here, in column WebId, S004 is missing. How can we locate such missing numbers, with a query.

Further explanation:

The web id should be in increasing order like, S001, S002, S003, S004, S005 if any number in between is missing, than that is the missing number. I do not have any separate table for the possible entries, as that is not practical. I've to find the missing numbers on a month to month basis, taking the starting and ending value of each month as the boundaries and than finding the missing numbers, if any.

A: 

You need to define what you mean by "missing". You can't expect your database server to understand this abstract concept. Perhaps a stored procedure is the best way to go, since then you can define your logic more precisely.

Brian
Your comment was valid. The question has been clarified. I suggest removing this answer to avoid down-votes.
Jonathan Leffler
+1  A: 

Unless you have a specific number layout already defined (it looks like you are), have a table with all of the possibilities (not very time efficient though) and you can do something like this:

Get one table with all the possiblities of name PossibleEntries and then do this:

SELECT pe.WebID from PossibleEntries pe WHERE pe.WebID Not In (Select WebID from SampleData)

I think that should work, but I don't know how efficient it is. I agree with above. If the numbers aren't sequential, you won't be able to do this.

Rob
+1  A: 

Personally, I'd do this in PHP or whatever programming language you're using with SQL. If you aren't able to have a separate table with every possible value (why not, by the way?) then the approach I'd take would be to do a straightforward query to get the values that are in the table:

select WebID from table order by WebID;

and then use a simple loop to find which ones are missing. E.g., in php:

$values = Array();
$query = "select WebID from table order by WebID;";
$dataset = mysql_query ($query) or die (mysql_error());
while ($data = mysql_fetch_assoc($dataset))
{
  $values[$data['WebID'] = 1;
}

$last_line = $data['WebID'];
$matches = Array();
ereg("S([0-9]+)", $last_line, $matches))

$max_value = $matches[0];
$missing = Array();

for ($count = 0; $count < $max_value; $count ++)
{
  if (!isset($values[$count])
  { 
    echo "value $count is missing\n";
    $missing[$count] = true;
  }
}

I haven't tested it, but if you do happen to be using PHP, then that might do what you want.

Ben

Ben
+1  A: 

There's a standard trick for generating integers that requires you to create a 10 row utility table viz:

create table Pivot (i int)

insert into Pivot values (0)
insert into Pivot values (1)
insert into Pivot values (2) 

/* ... down to */

insert into Pivot values (9)

Once you've done this, then, for example

select u.i + 10*t.i + 100*h.i from Pivot u, Pivot t, Pivot h

will get you all the numbers 0 to 999.

Add a where clause to restrict you between a range, and some string functions will get you to the PossibleEntries table in Robs answer above.

cindi
That's moderately hard work.
Jonathan Leffler
A: 

(Aside: Why do people in general (Rahul is not the only one, by any stretch of the imagination) omit the name of their table from the question?)

It is very difficult to do in a relational way because it inherently relies on the ordering of data and relational algebra works on (unordered) sets. I take it that we should assume that there is no significance to the DocID column and it cannot be used to help solve the problem.

In the example, you have S003 and S005 and are missing S004. How do we tell that there is a missing value? Presumably, because there is a comparison operation that tells us 'less than', 'equal', 'greater than', and also because there is a difference function that tells us that the gap between S003 and S005 is 2. Let's assume that '>' and friends do the comparison (works here for character strings), and that you can produce a stored procedure webid_diff() which takes two WebID values and returns the difference.

Then, you could write a query such as:

SELECT a.webid, MIN(b.webid) AS min_next
    FROM AnonymousTable AS a, AnonymousTable AS b
    WHERE a.webid < b.webid
    GROUP BY a.webid;

This uses a non-equijoin between the table and itself to find the minimum successor WebID value for each item.

With that as the core, we can then filter the result to select only those rows for which the gap between WebID and Min_Next is more than one. So, I think we get (1st attempt):

SELECT x.webid, y.min_next, webid_diff(x.webid, y.min_next) AS gap
    FROM AnonymousTable AS x,
         (SELECT a.webid, MIN(b.webid) AS min_next
             FROM AnonymousTable AS a, AnonymousTable AS b
             WHERE a.webid < b.webid
             GROUP BY a.webid
         ) AS y
    WHERE x.webid = y.webid
      AND webid_diff(x.webid, y.min_next) > 1;

Is the join at the outer level actually getting us anything useful? I don't think so, so we can remove it, leading to (2nd attempt):

SELECT y.webid, y.min_next, webid_diff(y.webid, y.min_next) AS gap
    FROM (SELECT a.webid, MIN(b.webid) AS min_next
             FROM AnonymousTable AS a, AnonymousTable AS b
             WHERE a.webid < b.webid
             GROUP BY a.webid
         ) AS y
    WHERE webid_diff(y.webid, y.min_next) > 1;

This does work. Trying to put the webid_diff() function into the inner query gives me problems - at least the GAP expression would have to be included in the GROUP BY clause, but that then is going to give the wrong answer.

The HAVING clause is used to apply filter conditions to aggregates, so it looks a bit as though the query might be reducible to:

SELECT a.webid, MIN(b.webid) AS min_next, webid_diff(a.webid, b.webid) AS gap
    FROM AnonymousTable AS a, AnonymousTable AS b
    WHERE a.webid < b.webid
    GROUP BY a.webid
    HAVING webid_diff(a.webid, b.webid) > 1;

However, this doesn't work (for me, with my DBMS - IBM Informix Dynamic Server) because webid_diff() is not an aggregate.

Here's the code I used for the webid_diff() function (you'd have to adjust to suit the syntax of your DBMS), and the auxilliary webid_num() function:

CREATE FUNCTION webid_num(a CHAR(4)) RETURNING INTEGER;
    DEFINE i INTEGER;
    LET i = substr(a, 2, 3);
    RETURN i;
END FUNCTION;

CREATE FUNCTION webid_diff(a CHAR(4), b CHAR(4)) RETURNING INTEGER;
    DEFINE i, j INTEGER;
    LET i = webid_num(a);
    LET j = webid_num(b);
    RETURN (j - i);
END FUNCTION;
Jonathan Leffler
A: 

My guess is that you're database has a serious design flaw, since it looks like your WebID is really at least two columns that you've combined together. The numeric part obviously has some sort of meaning since you want it to be sequential, but if that's the case then what does the "S" mean? As a result of this design flaw, the solution to your problem is going to be more complex than it needs to be. Also, that you state that it's not "practical" to store data that is important to the database is a big red flag.

Setting that aside, the following query should give you any missing values:

SELECT
     (
          SELECT
               SUBSTRING(MAX(T4.WebID), 1, 1) +
               RIGHT('000' + CAST(CAST(SUBSTRING(MAX(T4.WebID), 2, 3) AS INT) + 1 AS VARCHAR), 3)
           FROM My_Table T4
           WHERE T4.WebID < T1.WebID
     ) AS min_range,
     SUBSTRING(T1.WebID, 1, 1) + RIGHT('000' + CAST(CAST(SUBSTRING(T1.WebID, 2, 3) AS INT) - 1 AS VARCHAR), 3) AS max_range
FROM
     My_Table T1
LEFT OUTER JOIN My_Table T2 ON
     T2.WebID = SUBSTRING(T1.WebID, 1, 1) +
                RIGHT('000' + CAST(CAST(SUBSTRING(T1.WebID, 2, 3) AS INT) - 1 AS VARCHAR), 3)

WHERE
     T2.WebID IS NULL AND
     T1.WebID <> (SELECT MIN(WebID) FROM My_Table)

It gives you a start and end for each range of missing values, rather than a distinct list of each one. To get that you'll need a table of numbers which cindi and Rob have covered.

Tom H.
+3  A: 

A very simple approach :)

mysql> select * from test;
+----------+-------+
| DocEntry | WebId |
+----------+-------+
| 1        | S001  |
| 2        | S002  |
| 3        | S003  |
| 4        | S005  |
| 5        | S006  |
| 6        | S007  |
| 7        | S008  |
| 8        | S010  |
+----------+-------+
8 rows in set (0,00 sec)

mysql> SELECT right(t1.webid,3) +1 as missing_WebId FROM test t1 left join test t2 on right(t1.webid,3)+1 = right(t2.webid,3) where t2.webid is null;
+---------------+
| missing_WebId |
+---------------+
| 4             |
| 9             |
| 11            |
+---------------+
3 rows in set (0,01 sec)

good luck, Maurice

Maupie