ansaurus

Question

Questions every good Database/SQL developer should be able to answer

Answer 1

+21 A:

The different types of JOINs:

INNER JOIN
LEFT and RIGHT OUTER JOIN
FULL JOIN
CROSS JOIN

See Jeff Atwood's Visual Explanation of JOINs

What is a key? A candidate key? A primary key? An alternate key? A foreign key?
What is an index and how does it help your database?
What are the data types available and when to use which ones?

marc_s 2010-01-22 19:27:37

If you don't thoroughly understand joins, you can't be a database programmer. This would be the first question and I wouldn't bother asking any others if the answer was wrong.

HLGEM 2010-01-22 20:03:33

Understanding set theory in general is a prerequisite to becoming a good sql developer. So I would add unions and intersections to your list as well.

Logicalmind 2010-01-22 20:44:22

Joins?! Nuthin' a good denormalized, monolithic table of *epic* proportion can't handle... :p

OMG Ponies 2010-01-22 22:48:34

Let's not insult the set theorists - grokking unions and intersections doesn't equate with "understanding set theory"

klochner 2010-01-22 23:06:10

Clearly we've got a set theory phd in the house. You should have quoted "understanding set theory in general". Some people would equate a "general understanding" with "basic concepts", like those found here http://en.wikipedia.org/wiki/Set_theory#Basic_concepts

Logicalmind 2010-01-22 23:51:48

Sorry, OCD kicking in. I would have gone with "understanding of basic set operations". "General" suggests to me some familiarity with the deeper concepts, especially when paired with "theory", but yes I'm being overly critical and the gist of your comment was right.

klochner 2010-01-23 00:25:45

When asking about different types of joins, don't ask interviewee to explain them but present a set of problems that requires the interviewee use each to solve. Many "database developers" can explain joins but not write them or know when to apply them.

Sam 2010-01-23 05:01:24

Answer 2

A:

Explain the difference between an inner and outer join.
What is a Cartesian product?
Explain 3rd normal form

keithwarren7 2010-01-22 19:28:11

Answer 3

+1 A:

Give an example where denomralization is preferable.

(I like this one because people come rampaging out of college looking to put everything into 3rd normal form)

Will 2010-01-22 19:31:01

hehe been there :)

R van Rijn 2010-01-22 19:31:58

"A log file" is a good answer. You might wish to record actions that change the state of the database, but if you link the log table to other tables the true original context of the actions can be lost. And don't even mention cascading deletes...

Will 2010-01-22 19:33:26

should a log file even be in the database, though? *Sir, I question your question!*

Jeff Atwood 2010-01-22 19:34:33

@jeff you're actually questioning my answer! And I didn't say it was a well designed answer, just a good one. Cut me some slack!

Will 2010-01-22 19:36:52

Denormalization is preferable when the importance of fetching complex data quickly is greater than the cost of cleaning up a database with bollixed data integrity.

Bill Karwin 2010-01-22 20:07:16

@Jeff Ever heard of Sarbanes Oxley? :) There are other reasons to do so, but the answer to your question is, "sometimes, but only when you have no other choice".

Dane 2010-03-31 07:21:34

Answer 4

+1 A:

Why should we hire you when we have a sophisticated application using a properly-optimized ORM and implementing caching systems such as memcached?

This is a serious question, they should be able to justify their existence. As Jeff Atwood likes to say "Hardware is Cheap, Programmers are Expensive"

jcm 2010-01-22 19:31:13

*ORM* and *optimized* is an oxymoron. An ORM makes development more efficient, not runtime performance. And it's nice to use memcache, but you still need someone analyzing what data is most advantageous to store in the caching layer.

Bill Karwin 2010-01-22 19:37:07

@Bill: that's an excellent description of ORMs

gbn 2010-01-22 19:46:19

@Bill so all of that hibernate profiling, eager/lazy loading, batch fetching, none of that is optimization? And while it is nice to have someone run statistics on the data to see what is putting the cache to use best, wouldn't that be more of a DBA job than a SQL Developer? Heck, I don't see why developers can't do that themselves as part of the optimization process.

jcm 2010-01-22 20:07:55

@jcm: No, they aren't optimizations. Those features you mention are *compensating* for the inefficiency caused by using an ORM. They would be needed if you weren't using an ORM. :-)

Bill Karwin 2010-01-22 20:16:10

@Bill I do understand that writing pure SQL would be faster than a well-optimized ORM implementation, albeit not by as much as many would like to think. Using an ORM also makes the developers lives much easier and more cost-effective. I simply cannot see how hiring a full-time sql developer is cost-effective in 99.9% of deployments.

jcm 2010-01-22 20:20:05

@jcm: Yes, as I said above, an ORM makes development more efficient. Developer time is the biggest single expense for a software project, so it's often cost-effective to use an ORM. A shorter development schedule can be more important to the project than producing code with good performance, but that doesn't mean your users will be happy.

Bill Karwin 2010-01-22 20:36:00

But development time can't even compare to the extra cost to the business of inefficient code. If I cost a 1,000 users five muntes a day due to inefficient code, wasn't it worth it to the business to actually spend the extra two minutes to write efficient code in the first place. And frankly I don't see where ORMs save that much time in development when the developer actually is an expert in SQL.

HLGEM 2010-01-22 21:18:37

@HLGEM: In fact, it seems most businesses would rather shorten time-to-market, even it means they produce poor code. They may think they'll always have the opportunity to fix pain points later. I'm not *endorsing* that practice, just observing that it's common. :-)

Bill Karwin 2010-01-22 22:50:28

Answer 5

A:

Explain possible constraints on tables
Explain views (and materialized)
Explain sequences
Explain triggers

R van Rijn 2010-01-22 19:36:10

Answer 6

+4 A:

At our company, instead of asking a lot of SQL questions that anyone with a good memory can answer, we created a SQL Developers test. The test is designed to have the candidate put together a solid schema with normalization and RI considerations, check constraints etc. And then be able to create some queries to produce results sets we're looking for. They create all this against a brief design specification we give them. They are allowed to do this at home, and take as much time as they need (within reason).

Randy Minder 2010-01-22 19:38:41

Answer 7

+9 A:

Here are a few:

What is normalization and why is it important?
What are some situations where you would de-normalize data?
What is a transaction and why is it important?
What is referential integrity and why is it important?
What steps would to take to investigate reports of slow database performance?

Phil Sandler 2010-01-22 19:39:54

Answer 8

+7 A:

What is sql injection and how do you prevent it?

What is a cursor and when would you use it (or not) and why?

DyingCactus 2010-01-22 19:42:35

Answer 9

+4 A:

Knowing not to use, and WHY not to use:

SELECT *

Jack Marchetti 2010-01-22 19:44:28

I do that a lot. It saves typing, and I don't have to remember the col names. What's wrong with that?

gary 2010-01-22 20:14:15

@gary: Risks returning unnecessary data, hides column name change errors...

OMG Ponies 2010-01-22 20:20:28

@gary, should not be used in production code because if the underlying data structure changes it will break your query. Having said that I'm as guilty as the next developer of doing so from time to time - but only when I know the sytem is under my total control

Cruachan 2010-01-22 20:25:52

@gary: If you have text/blob columns and are returning multiple rows, you will be putting unnecessary strain on the system.

Craig Young 2010-01-22 22:09:08

Ahem... how about the fact that unless you maintain massive indexes including every single column, `SELECT *` will prevent efficient index operations using a covering index? This is the most important reason!

Aaronaught 2010-01-23 00:25:04

It's completely valid to use `select *` inside an exists clause. Some developers put `select 1` in these cases, but the RDBMS will optimize it just the same. So `select 1` is a case of preliminary (and wrong) optimization.

Jordão 2010-01-25 18:41:37

Answer 10

A:

What are the downsides of using adhoc/on-the-fly SQL statements and what would you do instead?

This area can have a huge impact on performance and security of the db. Get this wrong, and you could end up in a world of pain.

In what situations would you use adhoc/on-the-fly SQL statements?

Because there is always an exception to the rule :)

AdaTheDev 2010-01-22 19:45:45

Answer 11

+5 A:

What is the difference between a clustered index and a nonclustered index?

Another question I would ask that is not for a specific server would be:

What is a deadlock?

Jose Chama 2010-01-22 19:47:41

AFAIK, that terminology is only applicable to MySQL and SQL Server

OMG Ponies 2010-01-22 20:18:21

OMG is right. This is db-specific. Other db's have the same feature. Oracle has index organized tables which are the same as a clustered index. Not only that, but there is almost no difference between a clustered index and a nonclustered index that includes every column in the table. The real point of the question, I assume, is to see if the person knows about the different physical storage characteristics of index types.

Logicalmind 2010-01-22 20:38:44

What do other vendors call it?

Mark Canlas 2010-01-22 21:00:45

@OMG - AFAIK clustered indices originated in Sybase, a version of which MSSQL was until (at least) 6.5, after which MS pretty much recoded the whole thing; one of their better efforts all round. Sybase ASE still looks and feels much like 6.5 on the surface.

Mike Woodhouse 2010-01-22 21:29:17

@Mark Canlas: Oracle doesn't distinguish, they're all "indexes" - no clustered/nonclustered. It automatically creates an index for a primary key if an index for the column doesn't already exist.

OMG Ponies 2010-01-22 21:55:42

@Mike Woodhouse: I understand that SQL Server is based on Sybase, but that's still only three database products that use the terminology. Just saying it's not the same for all.

OMG Ponies 2010-01-22 21:56:47

Trivia: the whole concept of indexes, including syntax and terminology, is not specified in the SQL standard! It's an implementation detail. It's amazing there's as much similarity between vendors as there is.

Bill Karwin 2010-01-22 22:54:57

Answer 12

+1 A:

What database types had caused to you, as a developer, more trouble to understand and debug? Expected answer, IMHO, experience with issues using different date/timestamp types and BLOBs.
When is convenient to use bitmap indexes?

JuanZe 2010-01-22 19:52:30

Answer 13

+5 A:

I would give a badly written query and ask them how they would go about performance tuning it.

I would ask about set theory. If you don't understand operating in sets, you can't effectively query a relational database.

I would give them some cursor examples and ask how they would rewrite them to make them set-based.

If the job involved imports and exports I would ask questions about SSIS (or other tools involved in doing this used by other datbases). If it involved writing reports, I would want to know that they understand aggregates and grouping (As well as Crystal Reports or SSRS or whatever ereporting tool you use).

I would ask the difference in results between these three queries:

select  a.field1
        , a.field2
        , b.field3
from table1 a
join table2 b
    on a.id = b.id
where a.field5 = 'test'
    and b.field3 = 1

select  a.field1
        , a.field2
        , b.field3
from table1 a
left join table2 b
    on a.id = b.id
where a.field5 = 'test'
    and b.field3 = 1

select  a.field1
        , a.field2
        , b.field3
from table1 a
left join table2 b
    on a.id = b.id and b.field3 = 1
where a.field5 = 'test'

HLGEM 2010-01-22 20:01:46

I plead guilty: what is the difference between the last two? I thought they effectively were the same.

fireeyedboy 2010-01-22 21:21:39

The last two will return differnt results, the first two will return the same results. Putting a condition on table b in the where clause converts it to an inner join because the entire result set has to meet that condition which it won't if there is no matching record in table b. (Unless that condition is something like where b.id is null.) Can't tell you how many times I've had to fix this kind of code (just this morning as a matter of fact).

HLGEM 2010-01-22 21:37:30

I can tell you what's the same about them: they all have strange comma placement (I understand the reason for it, but it takes a little getting used to).

DanM 2010-01-22 22:52:29

Answer 14

+2 A:

Compare and contrast the differences between a sql/rdbms solution and nosql solution. You can't claim to be an expert in any technology without knowing its strengths and weaknesses as compared to its competitors.

Logicalmind 2010-01-22 20:28:09

Answer 15

+19 A:

A reprint of my answer here, as general guidelines for topics.

Basics

SELECTing columns from a table
Aggregates Part 1: COUNT, SUM, MAX/MIN
Aggregates Part 2: DISTINCT, GROUP BY, HAVING

Intermediate

JOINs, ANSI-89 and ANSI-92 syntax
UNION vs UNION ALL
NULL handling: COALESCE & Native NULL handling
Subqueries: IN, EXISTS, and inline views
Subqueries: Correlated
WITH syntax: Subquery Factoring/CTE
Views

Advanced Topics

Functions, Stored Procedures, Packages
Pivoting data: CASE & PIVOT syntax
Hierarchical Queries
Cursors: Implicit and Explicit
Triggers
Dynamic SQL
Materialized Views
Query Optimization: Indexes
Query Optimization: Explain Plans
Query Optimization: Profiling
Data Modelling: Normal Forms, 1 through 3
Data Modelling: Primary & Foreign Keys
Data Modelling: Table Constraints
Data Modelling: Link/Corrollary Tables
Full Text Searching
XML
Isolation Levels
Entity Relationship Diagrams (ERDs), Logical and Physical
Transactions: COMMIT, ROLLBACK, Error Handling

OMG Ponies 2010-01-22 20:40:08

Personally, I think several of the "Advanced" items should go under "Intermediate" or maybe "Competent." "Advanced" would be denormalizing, partitioning, query/index hints, replication, locking and deadlocks, and all that crazy stuff we wish we didn't have to deal with. Nevertheless, good list!

Aaronaught 2010-01-23 00:33:48

OMG Ponies 2010-01-23 01:22:21

Piffle. I don't believe in labels; you're a DBA whether you accept the title or not. ;)

Aaronaught 2010-01-23 01:28:33

@Aaronaught: I'm a DBA if I have the *access* :p

OMG Ponies 2010-01-23 01:39:59

I would put Transactions as basic - if you don't know what to do then any data saved cannot be trusted

Mark 2010-01-24 18:31:54

Answer 16

A:

What explain plan does and how interpret the results you get from it.

David Oneill 2010-01-22 21:15:52

Answer 17

A:

How do you detect and resolve concurrency issues in the application layer?
What locking paradigms are typically available, and discuss their pros and cons.
Discuss NULL values and related issues.
What is the largest database system you've worked on in terms of: # tables, # rows, # users.

Also the following platform specific (SQL Server) questions:

Discuss IDENTITY columns.
What is the timestamp datatype used for?

Craig Young 2010-01-22 21:32:22

Answer 18

A:

"Why should every SELECT always include DISTINCT ?"

Erwin Smout 2010-01-22 23:09:49

I don't understand your point. Could you expand on what you mean?

Logicalmind 2010-01-22 23:53:53

Unless the intention is to check if I would challenge that garbage statement; if someone were to seriously ask me that question, I'd tell them I'm no longer interested in the job.

Craig Young 2010-01-23 09:57:51

Okay, it seems Mr. Smout is serious about this. :/ Before you do any more damage, please take note: Your claim is **wrong**, in fact the converse in more true! 1) Suppose your query is correct, and does not return any duplicates, then including DISTINCT simply forces the RDBMS to check your result (zero benefit, and a lot of additional processing). 2) Suppose your query is incorrect, and _does_ return duplicates, then including DISTINCT simply **hides* the problem (again with additional processing). It would be better to spot the problem and fix your query... it'll run faster that way.

Craig Young 2010-01-24 16:30:30

Answer 19

+1 A:

An interesting question would involve relational division, or how to express a "for all" relationship, which would require nested not exists clauses.

The question comes straigh from this link.

Given the following tables, representing pilots that can fly planes and planes in a hangar:

create table PilotSkills (
  pilot_name char(15) not null,
  plane_name char(15) not null
)

create table Hangar (
  plane_name char(15) not null
)

Select the names of the pilots who can fly every plane in the hangar.

The answer:

select distinct pilot_name
from PilotSkills as ps1 
where not exists (
  select * from hangar
  where not exists (
    select * from PilotSkills as ps2 where 
      ps1.pilot_name = ps2.pilot_name and 
      ps2.plane_name = hangar.plane_name
  )
)

Or ...

Select all stack overflow users that have accepted answers in questions tagged with the 10 most popular programming languages.

The (possible) answer (assuming an Accepted_Answers view and a Target_Language_Tags table with the desired tags):

select distinct u.user_name
from Users as u
join Accepted_Answers as a1 on u.user_id = a1.user_id
where not exists (
  select * from Target_Language_Tags t
  where not exists (
    select * 
      from Accepted_Answers as a2
      join Questions as q on a2.question_id = q.question_id
      join Question_Tags as qt on qt.question_id = q.question_id 
    where 
      qt.tag_name = t.tag_name and
      a1.user_id = a2.user_id
  )
)

Jordão 2010-01-23 00:01:29

It is most certainly an interesting question. However, I doubt whether there are many companies that would need their developers to deal with such scenarios on a day-to-day basis.

Craig Young 2010-01-24 17:07:46

Fair enough, but I think that answering this question shows a good deal of logic reasoning, which is essential for a great database developer.

Jordão 2010-01-25 15:34:15

Answer 20

A:

The application is in use 24 hours a day. Your maintenance / update window is 2 hours every month, how do you plan to minimise disruption?

Tim Jarvis 2010-01-23 00:08:40

Answer 21

A:

"I don't understand your point. Could you expand on what you mean?"

It's one of a whole bunch of possible questions that reveals whether the answerer has any knowledge about fundamentals.

Relations cannot, and never need to, contain duplicate rows. As Codd once said : "if something is true, then saying it twice won't make it any truer.".

Erwin Smout 2010-01-24 15:22:30

You've added this as an additional answer, when really it looks like it should be a comment on your original answer. PS: It seems you're serious about wanting to always include DISTINCT in a SELECT clause; so **please**, **please** read my followup comment before you cause any more damage in your development endeavours.

Craig Young 2010-01-24 16:17:42

Answer 22

+2 A:

Craig Young 2010-01-24 16:48:04

Answer 23

+1 A:

Almost everything is mentioned here. I would like to share one question which I was asked by a senior manager on database. I found the question pretty interesting and if you think about it deeply, it sort of has lot of meaning in it.

Question was - How would you describe database to your 5 year old kid ?

Shamik 2010-01-25 17:21:37

ansaurus

tags:

views:

answers:

Questions every good Database/SQL developer should be able to answer

Basics

Intermediate

Advanced Topics

related questions