tags:

views:

46

answers:

2

I have a table with three filled columns named "Name", "City" and "Occupation". I want to create a new column in the same table that contains the number of people who have the same occupation.

"Name" | "City" | "Occupation"
------------------------------
Amy    | Berlin | Plumber
Bob    | Berlin | Plumber
Carol  | Berlin | Lawyer
David  | London | Plumber

I want to have a table that contains:

"Name" | "City" | "Occupation" | "Number"
---------------------------------------
Amy    | Berlin | Plumber      | 2
Bob    | Berlin | Plumber      | 2
Carol  | Berlin | Lawyer       | 1
David  | London | Plumber      | 1

How does the SQL Query that creates the new columns have to look like? I want to actually create a new column in the database that I can access later.

+1  A: 

Simple self-join:

SELECT t0.Name, t0.City, t0.Occupation, COUNT(*) AS Number
FROM sometable AS t0
JOIN sometable AS t1 ON t1.Occupation=t0.Occupation
GROUP BY t0.Name, t0.City, t0.Occupation

If Name is a primary key you can just group by that alone instead, since the other columns would have a functional dependency on it. Of course Name wouldn't normally be a very good primary key.

(You might need COUNT(*)-1 if you want the number of other people doing the job rather than the total. It's unclear; the numbers in your example don't add up either way.)

If you must change your schema (and I wouldn't recommend this kind of denormalisation unless you are absolutely certain you need it; see comments), you can do it with an UPDATE JOINed to Michael's subselect:

ALTER TABLE sometable ADD COLUMN Number INTEGER NOT NULL;
UPDATE sometable AS t0 JOIN (
    SELECT Occupation, COUNT(*) AS Number
    FROM sometable
    GROUP BY Occupation
) AS t1 ON t1.Occupation=t0.Occupation
SET t0.Number= t1.Number;
bobince
Thanks, I would however like to have the Number written into row because it would probably be to expensive to do a join every time that I need the number.
Christian
Premature optimisation: don't change your schema until you need to. With an index on `Occupation` this will be very fast; I can do a similar self-join-and-count query over a web forum database of 500k posts in about 5s. Michael's query is even faster (though for LIMITed results, it's slower); which is best depends on what exactly you've got in your database, but with suitable indexes both will be fast.
bobince
I have 20,000,000 entries in my database and I'm looking to make 100,000s of queries, I need performance.
Christian
+2  A: 
select tbl.name, tbl.city, tbl.occupation,  x.number
from tbl
join
(
    select occupation, count(*) as number
    from tbl
    group by occupation
) as x on x.occupation = tbl.occupation
Michael Buen