views:

1629

answers:

23

I was going through Questions every good .Net developer should be able to answer and was highly impressed with the content and approach of this question and so in the same spirit, I am asking this question for Database/SQL Developer.

What questions do you think should a good Database/SQL programmer be able to respond to?

EDIT : I am marking this question as community wiki as it is not user specific and it aims to serve programming community at large.

Looking forward for some amazing responses.

NOTE : Please answer questions too as suggested in the comments so that people could learn something new too regarding the language.

+21  A: 

The different types of JOINs:

  • INNER JOIN
  • LEFT and RIGHT OUTER JOIN
  • FULL JOIN
  • CROSS JOIN

See Jeff Atwood's Visual Explanation of JOINs

  • What is a key? A candidate key? A primary key? An alternate key? A foreign key?
  • What is an index and how does it help your database?

  • What are the data types available and when to use which ones?

marc_s
If you don't thoroughly understand joins, you can't be a database programmer. This would be the first question and I wouldn't bother asking any others if the answer was wrong.
HLGEM
Understanding set theory in general is a prerequisite to becoming a good sql developer. So I would add unions and intersections to your list as well.
Logicalmind
Joins?! Nuthin' a good denormalized, monolithic table of *epic* proportion can't handle... :p
OMG Ponies
Let's not insult the set theorists - grokking unions and intersections doesn't equate with "understanding set theory"
klochner
Clearly we've got a set theory phd in the house. You should have quoted "understanding set theory in general". Some people would equate a "general understanding" with "basic concepts", like those found here http://en.wikipedia.org/wiki/Set_theory#Basic_concepts
Logicalmind
Sorry, OCD kicking in. I would have gone with "understanding of basic set operations". "General" suggests to me some familiarity with the deeper concepts, especially when paired with "theory", but yes I'm being overly critical and the gist of your comment was right.
klochner
When asking about different types of joins, don't ask interviewee to explain them but present a set of problems that requires the interviewee use each to solve. Many "database developers" can explain joins but not write them or know when to apply them.
Sam
A: 
  • Explain the difference between an inner and outer join.
  • What is a Cartesian product?
  • Explain 3rd normal form
keithwarren7
+1  A: 

Give an example where denomralization is preferable.

(I like this one because people come rampaging out of college looking to put everything into 3rd normal form)

Will
hehe been there :)
R van Rijn
"A log file" is a good answer. You might wish to record actions that change the state of the database, but if you link the log table to other tables the true original context of the actions can be lost. And don't even mention cascading deletes...
Will
should a log file even be in the database, though? *Sir, I question your question!*
Jeff Atwood
@jeff you're actually questioning my answer! And I didn't say it was a well designed answer, just a good one. Cut me some slack!
Will
Denormalization is preferable when the importance of fetching complex data quickly is greater than the cost of cleaning up a database with bollixed data integrity.
Bill Karwin
@Jeff Ever heard of Sarbanes Oxley? :) There are other reasons to do so, but the answer to your question is, "sometimes, but only when you have no other choice".
Dane
+1  A: 

Why should we hire you when we have a sophisticated application using a properly-optimized ORM and implementing caching systems such as memcached?

This is a serious question, they should be able to justify their existence. As Jeff Atwood likes to say "Hardware is Cheap, Programmers are Expensive"

jcm
*ORM* and *optimized* is an oxymoron. An ORM makes development more efficient, not runtime performance. And it's nice to use memcache, but you still need someone analyzing what data is most advantageous to store in the caching layer.
Bill Karwin
@Bill: that's an excellent description of ORMs
gbn
@Bill so all of that hibernate profiling, eager/lazy loading, batch fetching, none of that is optimization? And while it is nice to have someone run statistics on the data to see what is putting the cache to use best, wouldn't that be more of a DBA job than a SQL Developer? Heck, I don't see why developers can't do that themselves as part of the optimization process.
jcm
@jcm: No, they aren't optimizations. Those features you mention are *compensating* for the inefficiency caused by using an ORM. They would be needed if you weren't using an ORM. :-)
Bill Karwin
@Bill I do understand that writing pure SQL would be faster than a well-optimized ORM implementation, albeit not by as much as many would like to think. Using an ORM also makes the developers lives much easier and more cost-effective. I simply cannot see how hiring a full-time sql developer is cost-effective in 99.9% of deployments.
jcm
@jcm: Yes, as I said above, an ORM makes development more efficient. Developer time is the biggest single expense for a software project, so it's often cost-effective to use an ORM. A shorter development schedule can be more important to the project than producing code with good performance, but that doesn't mean your users will be happy.
Bill Karwin
But development time can't even compare to the extra cost to the business of inefficient code. If I cost a 1,000 users five muntes a day due to inefficient code, wasn't it worth it to the business to actually spend the extra two minutes to write efficient code in the first place. And frankly I don't see where ORMs save that much time in development when the developer actually is an expert in SQL.
HLGEM
@HLGEM: In fact, it seems most businesses would rather shorten time-to-market, even it means they produce poor code. They may think they'll always have the opportunity to fix pain points later. I'm not *endorsing* that practice, just observing that it's common. :-)
Bill Karwin
A: 
  • Explain possible constraints on tables
  • Explain views (and materialized)
  • Explain sequences
  • Explain triggers
R van Rijn
+4  A: 

At our company, instead of asking a lot of SQL questions that anyone with a good memory can answer, we created a SQL Developers test. The test is designed to have the candidate put together a solid schema with normalization and RI considerations, check constraints etc. And then be able to create some queries to produce results sets we're looking for. They create all this against a brief design specification we give them. They are allowed to do this at home, and take as much time as they need (within reason).

Randy Minder
+9  A: 

Here are a few:

  • What is normalization and why is it important?
  • What are some situations where you would de-normalize data?
  • What is a transaction and why is it important?
  • What is referential integrity and why is it important?
  • What steps would to take to investigate reports of slow database performance?
Phil Sandler
+7  A: 

What is sql injection and how do you prevent it?

What is a cursor and when would you use it (or not) and why?

DyingCactus
+4  A: 

Knowing not to use, and WHY not to use:

SELECT *
Jack Marchetti
I do that a lot. It saves typing, and I don't have to remember the col names. What's wrong with that?
gary
@gary: Risks returning unnecessary data, hides column name change errors...
OMG Ponies
@gary, should not be used in production code because if the underlying data structure changes it will break your query. Having said that I'm as guilty as the next developer of doing so from time to time - but only when I know the sytem is under my total control
Cruachan
@gary: If you have text/blob columns and are returning multiple rows, you will be putting unnecessary strain on the system.
Craig Young
Ahem... how about the fact that unless you maintain massive indexes including every single column, `SELECT *` will prevent efficient index operations using a covering index? This is the most important reason!
Aaronaught
It's completely valid to use `select *` inside an exists clause. Some developers put `select 1` in these cases, but the RDBMS will optimize it just the same. So `select 1` is a case of preliminary (and wrong) optimization.
Jordão
A: 
  • What are the downsides of using adhoc/on-the-fly SQL statements and what would you do instead?

This area can have a huge impact on performance and security of the db. Get this wrong, and you could end up in a world of pain.

  • In what situations would you use adhoc/on-the-fly SQL statements?

Because there is always an exception to the rule :)

AdaTheDev
+5  A: 

What is the difference between a clustered index and a nonclustered index?

Another question I would ask that is not for a specific server would be:

What is a deadlock?

Jose Chama
AFAIK, that terminology is only applicable to MySQL and SQL Server
OMG Ponies
OMG is right. This is db-specific. Other db's have the same feature. Oracle has index organized tables which are the same as a clustered index. Not only that, but there is almost no difference between a clustered index and a nonclustered index that includes every column in the table. The real point of the question, I assume, is to see if the person knows about the different physical storage characteristics of index types.
Logicalmind
What do other vendors call it?
Mark Canlas
@OMG - AFAIK clustered indices originated in Sybase, a version of which MSSQL was until (at least) 6.5, after which MS pretty much recoded the whole thing; one of their better efforts all round. Sybase ASE still looks and feels much like 6.5 on the surface.
Mike Woodhouse
@Mark Canlas: Oracle doesn't distinguish, they're all "indexes" - no clustered/nonclustered. It automatically creates an index for a primary key if an index for the column doesn't already exist.
OMG Ponies
@Mike Woodhouse: I understand that SQL Server is based on Sybase, but that's still only three database products that use the terminology. Just saying it's not the same for all.
OMG Ponies
Trivia: the whole concept of indexes, including syntax and terminology, is not specified in the SQL standard! It's an implementation detail. It's amazing there's as much similarity between vendors as there is.
Bill Karwin
+1  A: 
  • What database types had caused to you, as a developer, more trouble to understand and debug? Expected answer, IMHO, experience with issues using different date/timestamp types and BLOBs.

  • When is convenient to use bitmap indexes?

JuanZe
+5  A: 

I would give a badly written query and ask them how they would go about performance tuning it.

I would ask about set theory. If you don't understand operating in sets, you can't effectively query a relational database.

I would give them some cursor examples and ask how they would rewrite them to make them set-based.

If the job involved imports and exports I would ask questions about SSIS (or other tools involved in doing this used by other datbases). If it involved writing reports, I would want to know that they understand aggregates and grouping (As well as Crystal Reports or SSRS or whatever ereporting tool you use).

I would ask the difference in results between these three queries:

select  a.field1
        , a.field2
        , b.field3
from table1 a
join table2 b
    on a.id = b.id
where a.field5 = 'test'
    and b.field3 = 1

select  a.field1
        , a.field2
        , b.field3
from table1 a
left join table2 b
    on a.id = b.id
where a.field5 = 'test'
    and b.field3 = 1

select  a.field1
        , a.field2
        , b.field3
from table1 a
left join table2 b
    on a.id = b.id and b.field3 = 1
where a.field5 = 'test'
HLGEM
I plead guilty: what is the difference between the last two? I thought they effectively were the same.
fireeyedboy
The last two will return differnt results, the first two will return the same results. Putting a condition on table b in the where clause converts it to an inner join because the entire result set has to meet that condition which it won't if there is no matching record in table b. (Unless that condition is something like where b.id is null.) Can't tell you how many times I've had to fix this kind of code (just this morning as a matter of fact).
HLGEM
I can tell you what's the same about them: they all have strange comma placement (I understand the reason for it, but it takes a little getting used to).
DanM
+2  A: 

Compare and contrast the differences between a sql/rdbms solution and nosql solution. You can't claim to be an expert in any technology without knowing its strengths and weaknesses as compared to its competitors.

Logicalmind
+19  A: 

A reprint of my answer here, as general guidelines for topics.

Basics


  1. SELECTing columns from a table
  2. Aggregates Part 1: COUNT, SUM, MAX/MIN
  3. Aggregates Part 2: DISTINCT, GROUP BY, HAVING

Intermediate


  1. JOINs, ANSI-89 and ANSI-92 syntax
  2. UNION vs UNION ALL
  3. NULL handling: COALESCE & Native NULL handling
  4. Subqueries: IN, EXISTS, and inline views
  5. Subqueries: Correlated
  6. WITH syntax: Subquery Factoring/CTE
  7. Views

Advanced Topics


  • Functions, Stored Procedures, Packages
  • Pivoting data: CASE & PIVOT syntax
  • Hierarchical Queries
  • Cursors: Implicit and Explicit
  • Triggers
  • Dynamic SQL
  • Materialized Views
  • Query Optimization: Indexes
  • Query Optimization: Explain Plans
  • Query Optimization: Profiling
  • Data Modelling: Normal Forms, 1 through 3
  • Data Modelling: Primary & Foreign Keys
  • Data Modelling: Table Constraints
  • Data Modelling: Link/Corrollary Tables
  • Full Text Searching
  • XML
  • Isolation Levels
  • Entity Relationship Diagrams (ERDs), Logical and Physical
  • Transactions: COMMIT, ROLLBACK, Error Handling
OMG Ponies
Personally, I think several of the "Advanced" items should go under "Intermediate" or maybe "Competent." "Advanced" would be denormalizing, partitioning, query/index hints, replication, locking and deadlocks, and all that crazy stuff we wish we didn't have to deal with. Nevertheless, good list!
Aaronaught
OMG Ponies
Piffle. I don't believe in labels; you're a DBA whether you accept the title or not. ;)
Aaronaught
@Aaronaught: I'm a DBA if I have the *access* :p
OMG Ponies
I would put Transactions as basic - if you don't know what to do then any data saved cannot be trusted
Mark
A: 

What explain plan does and how interpret the results you get from it.

David Oneill
A: 
  • How do you detect and resolve concurrency issues in the application layer?
  • What locking paradigms are typically available, and discuss their pros and cons.
  • Discuss NULL values and related issues.
  • What is the largest database system you've worked on in terms of: # tables, # rows, # users.

Also the following platform specific (SQL Server) questions:

  • Discuss IDENTITY columns.
  • What is the timestamp datatype used for?
Craig Young
A: 

"Why should every SELECT always include DISTINCT ?"

Erwin Smout
I don't understand your point. Could you expand on what you mean?
Logicalmind
Unless the intention is to check if I would challenge that garbage statement; if someone were to seriously ask me that question, I'd tell them I'm no longer interested in the job.
Craig Young
Okay, it seems Mr. Smout is serious about this. :/ Before you do any more damage, please take note: Your claim is **wrong**, in fact the converse in more true! 1) Suppose your query is correct, and does not return any duplicates, then including DISTINCT simply forces the RDBMS to check your result (zero benefit, and a lot of additional processing). 2) Suppose your query is incorrect, and _does_ return duplicates, then including DISTINCT simply **hides* the problem (again with additional processing). It would be better to spot the problem and fix your query... it'll run faster that way.
Craig Young
+1  A: 

An interesting question would involve relational division, or how to express a "for all" relationship, which would require nested not exists clauses.

The question comes straigh from this link.

Given the following tables, representing pilots that can fly planes and planes in a hangar:

create table PilotSkills (
  pilot_name char(15) not null,
  plane_name char(15) not null
)

create table Hangar (
  plane_name char(15) not null
)

Select the names of the pilots who can fly every plane in the hangar.

The answer:

select distinct pilot_name
from PilotSkills as ps1 
where not exists (
  select * from hangar
  where not exists (
    select * from PilotSkills as ps2 where 
      ps1.pilot_name = ps2.pilot_name and 
      ps2.plane_name = hangar.plane_name
  )
)

Or ...

Select all stack overflow users that have accepted answers in questions tagged with the 10 most popular programming languages.

The (possible) answer (assuming an Accepted_Answers view and a Target_Language_Tags table with the desired tags):

select distinct u.user_name
from Users as u
join Accepted_Answers as a1 on u.user_id = a1.user_id
where not exists (
  select * from Target_Language_Tags t
  where not exists (
    select * 
      from Accepted_Answers as a2
      join Questions as q on a2.question_id = q.question_id
      join Question_Tags as qt on qt.question_id = q.question_id 
    where 
      qt.tag_name = t.tag_name and
      a1.user_id = a2.user_id
  )
)
Jordão
It is most certainly an interesting question. However, I doubt whether there are many companies that would need their developers to deal with such scenarios on a day-to-day basis.
Craig Young
Fair enough, but I think that answering this question shows a good deal of logic reasoning, which is essential for a great database developer.
Jordão
A: 

The application is in use 24 hours a day. Your maintenance / update window is 2 hours every month, how do you plan to minimise disruption?

Tim Jarvis
A: 

"I don't understand your point. Could you expand on what you mean?"

It's one of a whole bunch of possible questions that reveals whether the answerer has any knowledge about fundamentals.

Relations cannot, and never need to, contain duplicate rows. As Codd once said : "if something is true, then saying it twice won't make it any truer.".

Erwin Smout
You've added this as an additional answer, when really it looks like it should be a comment on your original answer. PS: It seems you're serious about wanting to always include DISTINCT in a SELECT clause; so **please**, **please** read my followup comment before you cause any more damage in your development endeavours.
Craig Young
+2  A: 
Craig Young
+1  A: 

Almost everything is mentioned here. I would like to share one question which I was asked by a senior manager on database. I found the question pretty interesting and if you think about it deeply, it sort of has lot of meaning in it.

Question was - How would you describe database to your 5 year old kid ?

Shamik