views:

54

answers:

4

Hopefully this is less complex than I think.

I have one table of companies, and another table of jobs, and a third table with that contains a single entry for each employee in each job from each company. NOTE: Some companies won't have employees in some jobs, and some companies will have more than one employee in some jobs.

The company table has a companyid and companyname field, the job table has a jobid and jobtitle field, and the employee table has employeeid, companyid, jobid and employeename fields.

I want to build a table like this:

      +-----------+-----------+-----------+
      | Company A | Company B | Company C |
------+-----------+-----------+-----------+
Job A | Emp 1     | Emp 2     |           |
------+-----------+-----------+-----------+
Job B | Emp 3     |           | Emp 4     |
      |           |           | Emp 5     |
------+-----------+-----------+-----------+
Job C |           | Emp 6     |           |
      |           | Emp 7     |           |
      |           | Emp 8     |           |
------+-----------+-----------+-----------+

I had previously been looping through a result set of jobs, and for each job, looping through a result set of each company, and for each company, looping through each employee and printing it in a table (gross, but performance was not supposed to be a consideration). The app has grown in popularity, and now we have 100 companies and hundreds of jobs, and the server is crapping out (all the id fields are indexed).

Any suggestions on how to write a single query to get this data? I don't need the company names or job titles (obviously), but I do need some way to identify where each row from the result should be printed. I'm imagining a result set that just contained a long list of joined employees, and I could write a loop to use the companyid and employeeid values to tell me when to create a new cell or table row. This works as long as there aren't ZERO employees; I would need a NULL employee name for that I think? Am I completely on the wrong track?

Thanks in advance for any ideas!

A: 

I think Joins might be the answer you are looking for.

Josh K
Left Joins actually... this might not exist, that might not exist... Result is a slow query
SeanJA
@SeanJA: result doesn't have to be slow if the tables are properly indexed.
outis
It will still (usually) be slower than getting the data back and processing it into a table using code.
SeanJA
If the tables are set up properly you will save yourself a lot of time and effort crafting the proper query rather then pulling lots of data and using the server to process it into a table.
Josh K
I was not implying that he should pull out all of the data from all of the tables and then process it all in whatever language he is using, Limiting the queries, Indexing things, not displaying all of the data at once `[show more results for this company]`... He is not going to be able to get a result set that looks exactly like he wants it without doing some processing.
SeanJA
@SeanJA: post-query processing is definitely called for. You should post the details as an answer.
outis
Your discussion above is appreciated-- so far I agree that post-processing is necessary. One query and thousands of PHP comparisons seems to be running faster than thousands of queries. I was hoping there would be some magic bullet to save me *some* of the post work, and this thread has been educational!
iopener
+2  A: 

The trick isn't so much in the query (something like

SELECT jobtitle, companyname, employeename 
  FROM employee AS e
    RIGHT JOIN company AS c ON e.companyid = c.companyid
    RIGHT JOIN job AS j ON e.jobid = j.jobid
  WHERE ...
  ORDER BY j.jobtitle, c.companyname

) as how you handle the result. For each result row, start a new table cell when the company changes, and start a new table row when the job title changes. The trickiest part is determining when you need to output a close tag.

outis
Thanks for this outis-- I've got this working and it looks good on my development server, but as soon as I put in on my shared hosting server (16M memory, 30 seconds execution), it runs out of memory. I'm going to check my PHP for leaks.
iopener
@iopener: if possible, try an unbuffered query (http://stackoverflow.com/questions/131139/streaming-large-result-sets-with-mysql-4-1-x-connector-j; http://stackoverflow.com/questions/1448661/django-unbuffered-mysql-query; if using PHP, PDO are unbuffered by default) so that the script doesn't have to hold the entire result set (though you won't be able to get the count of result rows until the ent). Also try SeanJA's suggestion to limit/paginate the results. Profile the query to see how it performs, both in the script and on its own (http://dev.mysql.com/doc/refman/5.1/en/explain.html).
outis
In the end, I inner-joined the tables together, and allowed NULL matches to be skipped. In my PHP, I looped through each combination of companyid and jobid, and checked if it matched the current result record. If so, I print it; if not, I close the cell and move to the next combination of companyid and jobid. Since the data is sorted in the same order as the cells are printed, it worked great: a page that couldn't finish in two minutes with 32MB of memory is now finishing in 6 seconds with 16MB.Thanks to everybody for your input!
iopener
+2  A: 

I think this might be closer to what you're asking for:

SELECT j.jobtitle, e1.employeeName, e2.employeename, e3.employeename
FROM jobs AS j
LEFT OUTER JOIN employee AS e1 ON e1.jobid = j.jobid AND e1.companyid = 1
LEFT OUTER JOIN employee AS e2 ON e2.jobid = j.jobid AND e2.companyid = 2
LEFT OUTER JOIN employee AS e3 ON e3.jobid = j.jobid AND e3.companyid = 3
ORDER BY j.jobtitle

The LEFT OUTER JOIN will give you NULL where there is no match.

You can see how it might get ugly after you have added a few companies. If you have indexed your id columns, this will not be slow. Don't fear the join - databases are made for it!

Andrew
I get the gist of this query, and it's a very clever idea. I generated the SQL in PHP code, printed it out and ran it in phpMyAdmin, and it ran for over 15 minutes without finishing. I wonder if it has to do with the large number of NULL values there are in the data?
iopener
A: 

Replace the variables to the corresponding company ids and increase @@group_concat_max_len if required:

SELECT
    jobtitle as '',
    GROUP_CONCAT(IF(companyid=@companyA_id, employeename,NULL) SEPARATOR '\n') as `Company A`,
    GROUP_CONCAT(IF(companyid=@companyB_id, employeename,NULL) SEPARATOR '\n') as `Company B`,
    GROUP_CONCAT(IF(companyid=@companyC_id, employeename,NULL) SEPARATOR '\n') as `Company C`
FROM employee
LEFT JOIN company
    USING (companyid)
LEFT JOIN job
    USING (jobid)
GROUP BY jobid;

UPD: Seems like you do not even need a JOIN to the company table, thus it might be simplified to:

SELECT
    jobtitle as '',
    GROUP_CONCAT(IF(companyid=@companyA_id, employeename,NULL) SEPARATOR '\n') as `Company A`,
    GROUP_CONCAT(IF(companyid=@companyB_id, employeename,NULL) SEPARATOR '\n') as `Company B`,
    GROUP_CONCAT(IF(companyid=@companyC_id, employeename,NULL) SEPARATOR '\n') as `Company C`
FROM employee
LEFT JOIN job
    USING (jobid)
GROUP BY jobid;

But you will probably need an index on companyid field in employee table in that case.

newtover