views:

161

answers:

9

In general, I think I do alright when it comes to coding in programming languages, but I think I'm missing something huge when it comes to databases.

I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.

You see, like so many new programmers, I tend to treat my databases as a dumping ground for data. Most of what I do comes down to relatively simple SQL (INSERT this, SELECT that, DELETE this_other_thing), which is mostly independent of the engine I'm using (with minor exceptions, of course, mostly minor tweaks for syntax).

Could someone explain some common use cases for databases where the specific platform comes into play?

I'm sure things like stored procedures are a big one, but (a) these are mostly written in a specific language (T-SQL, etc) which would be a different job ad requirement than the specific RDBMS itself, and (b) I've heard from various sources that stored procedures are on their way out and that in a lot of cases they shouldn't be used now anyway. I believe Jeff Atwood is a member of this camp.

Thanks.


The above concepts do not vary much for MySQL, SQL Server, Oracle, etc.

With this question, I'm mostly trying to determine the important difference between these. I.e. why would a job ad demand n years experience with MySQL when most common use cases are relatively stable across RDBMS platforms.

CRUD statements, joins, indexes.. all of these are relatively straightforward within the confines of a certain engine. The concepts are easily transferable if you know a different RDBMS.

What I'm looking for are the specifics which would cause an employer to specify a specific engine rather than "experience using common database engines."

+2  A: 

After the CRUD statements, to be an effective DB programmer I think some of the most important things to understand are JOIN statements. Understand the difference between LEFT and RIGHT, OUTER and INNER joins, and know when to use each. Most importantly, know what the database is actually constructing when it performs a JOIN.

For me, the Wikipedia article was very helpful.

Also, indexing is very important - this is how relational databases can perform fast queries. Understand how to use them and what happens under the hood.

Wikipedia article on DB indexing.

You should also know how to construct a many-to-one relationship (using foreign keys) and a many-to-many relationship (using join tables).

I know that in your question you're asking about specific DB implementations, but if you're to be taken literally and you only know about SELECT, INSERT, UPDATE, and DELETE, then the above concepts will be far more valuable than learning the intricacies of a particular implementation.

danben
I would say the first thing to know is the normal forms.. If you have a bad structure in the first place to work with, it will be difficult. It's just an addition to your answer and not a disapproval ;)
Mike Gleason jr Couturier
+2  A: 

I believe that the essential knowledge about databases should be:

The above concepts do not vary much between MySQL, SQL Server, Oracle, Postgres, and other relational database systems. However you'd find a different set of concepts for the now-popular NoSQL databases, such as CouchDB, MongoDB, SimpleDB, Cassandra, Bigtable, and many others.

Daniel Vassallo
"The above concepts do not vary much for MySQL, SQL Server, Oracle, etc." - I was trying to determine the important *difference* between these. I.e. why would a job ad demand n years experience with MySQL when most common use cases are relatively stable across RDBMS platforms.
Junior Programmer
@Junior Programmer: probably because it's a half DB admin / half programmer job, even though they don't tell you.
JRL
I'm sure that if you're an expert MySQL developer with 10 years experience, you would do fine in one of those jobs... When it comes to database administration, the story may be slightly different, even though not that much in my opinion.
Daniel Vassallo
@JRL: That's a decent assumption, but I see these requirements (often nice-to-haves, but still..) everywhere. I partially assume that they're just putting it down as a requirement because they use it and HR drones are just.. well, drones.
Junior Programmer
A: 

Don't forget relation schemas, Primary and foreign keys and how they are related. To start with DB, I would use MySql and MSSQL as these are most common in the market. I take Oracle as more advanced and complex db

Stewie Griffin
A: 

Hi,

As for the level of differences there are between vendors, it is because SQL is a standard (http://en.wikipedia.org/wiki/SQL#Standardization) and vendors implement that std differently.

Each of these vendors try to offer extras to have the crowd by their side... that's why you see functions available to one and not the other. But sometimes that function make its way into the standard so its not always a bad thing.

For stored proc. I would agree as ORMs and practices of today tend to do a greater separation of concerns by removing business logic from the database and considering it "only" a repository.

My 2 cents

Mike Gleason jr Couturier
+1  A: 

It's not just stored procs and functions. Each database has fundamental differences and quirks that are important to understand even though SQL works more or less the same.

Examples:

  • Oracle and MySQL handle locking differently, in different situations.
  • Oracle doesn't have autoincrementing primary keys like MySQL and SQL Server.
  • Subtle vendor-specific behavior, like the way Oracle does sorting for VARCHARs differently depending on locale.

If you really want to improve your applications, you eventually have to become familiar with the details about how your specific database works. Most of the time it doesn't make a lot of difference, but when it does matter, it usually makes a big difference, especially when it comes to performance.

Ken Liu
A: 

I see job ads requesting knowledge of MySQL, MSSQL, Oracle, etc. but I'm at a loss to determine what the differences would be.

I'm what's called a SQL Developer. You won't see the differences much when you are doing run of the mill database work (CRUD). However the differences become quite apparent when you are dealing with the databases own brand of SQL.

When talking SQL outside of the standards, there are 4 distinctive types of commands. These are:

  • Data Manipulation Language (DML)
  • Data Definition Language (DDL)
  • Data Control Language (DCL)
  • Transactional Control Language (TCL)

The biggest differences come in the last two, DCL and TCL. Those have a LOT of database specific non-standard SQL commands. The first two, DML and DDL, are very similar across any database that use the relational model.

Also the bigger database vendors have nicknamed their SQL implementation. Here's a short sample:

  • SQL Server : T-SQL
  • Oracle : PL-SQL
  • PostgreSQL : P-SQL or NG-SQL
  • Firebird : IB-SQL
  • MySQL : mSQL

The list goes on, but you get the point. Wikipedia has good articles on the different command acronyms.

I have found that most employers won't be able to articulate this, because most will use non-technical managers and/or HR to do the hiring. They are basically being told by the tech managers that the new hires need to know X technology. This and also, because the majority are too lazy to hire for intelligence, instead they fall back on the "We have X, so darn it, we need to hire somebody that knows X!" meme. The differences are actually not that hard to learn, for the people who frequent StackOverflow. I'm confident that anybody here can learn these fairly fast.

hectorsosajr
+1  A: 

Some things which seem to come up when talking with my Database-keen colleagues:

  • Row vs page vs table locking escalation when doing multiple complex joins, implies sometimes doing very different things on different vendors dbs. This is where the theory is really hitting the tarmac and often it is non-intuitive.
  • Differences between how cursors are best used on different vendor db implementations
  • Odd stuff in the stored proc language variants, like how best to handle failure cases
  • Differences in how temporary tables and views are best used depending on the underlying implementations.

All of these kind of things don't really matter until you are trying to solve something that either has to - Run very fast - Contain lots and lots of data - Gets very big and complex (i.e. multiple queries hitting same tables simultaneously)

These are the kinds of things that DBAs should be helping with, so depends on if you are aiming to be a DBA or a programmer. None of the above have really hurt me yet, because I've not worked on db-intensive systems, but I've worked near a few, and the programmers on those end up knowing a lot about the internals, restrictions, and good features about the specific database they are using.

Best way to get knowledge like that (other than on the job) is to read the manuals or hang out with people that already know and ask them about it.

DaveC
A: 

Even something as simple as an auto-incrementing primary key can be very different in Oracle, mysql, and SQL Server.

Some other important differences:

  • SQL Server makes a distinction between clustering key and primary key; other database do not. This choice comes with major performance implications.

  • SQL Server allows the SET @Total = Total = @Total + Amount syntax for fast computations of things like running totals. mysql lets you use a user variable in a similar way (I think). In other databases you'd probably have to use a correlated subquery. Huge difference in performance.

  • SQL Server can generate "sequential GUIDs" with newsequentialid. I'm not sure which other databases have this feature, but as with the above two points, there are significant performance implications to using a traditional GUID as opposed to a sequential or comb.

  • Oracle's CONNECT BY is a very useful and pretty unique syntax. Common Table Expressions in SQL Server and mysql are similar but not exactly the same.

  • Support for ranking/ordering functions varies vastly across different databases. I'm constantly posting answers here invoking ROW_NUMBER. A lot of queries are much harder to write without this - but at the same time, abusing it can hurt performance.

  • XML support is all over the map. Most databases have reasonably good support for it now, but both syntax and semantics are completely different on every platform.

  • Date/time handling can be quite different. Oracle has several different date/time-related types, some including time zone information. In general, Oracle is way better than other databases at managing temporal data, and has several features that you will miss if you switch. Until recently, Microsoft didn't have the date and time types, just datetime, which was much harder to normalize.

  • Spatial types are different and/or nonexistent in different databases. mysql exposes an entire OpenGIS model; Microsoft's support is a bit more basic but still competent. Oracle has it, but it's a little hard to find information on, and it's some sort of optional add-on. I think DB2 is starting to get it, but support is still a little spotty.

  • mysql actually lets you choose how to store an index (i.e. btree or hash). This is also an important performance consideration.

  • SQL Server allows you to INCLUDE columns in an index - very important for performance.

  • Oracle allows you to create function-based indexes, bitmap indexes, and so on. These can be pretty difficult to wrap your head around.

  • Oracle can perform "skip seeks" in very specific situations, something that I don't believe is supported in other databases (yet). This might factor into how you order index columns.

  • SQL Server has CLR types/functions/aggregates. Obviously not supported in any other database product.

  • Trigger support varies significantly. SQL Server has AFTER and INSTEAD OF. mysql has BEFORE and AFTER. Oracle has all of those and more. These all behave quite differently.

I'm sure that there are many, many more differences, but that should give you at least a basic idea of why 5 years of experience with Oracle is completely different from 5 years of experience with SQL Server.

Aaronaught
A: 

That databases are encoded collections of assertions of fact. That the logical structure of the tables corresponds to the syntactical structure of those "assertions of fact". That Normalization theory helps you find the most optimal logical structure of the database, by minimizing redundancy, i.e. minimizing the possibility for contradictions in said assertions of fact to occur. That database constraints are really nothing else than business rules, expressed in a formal way and in terms of the components of the database. That really every and any business rule can be expressed as a database constraint. That therefore, it is possible for the DBMS to enforce any and every business rule you can imagine. That there is a very important difference between logical design and physical design. That SQL and SQL systems are, eurhm, not really helpful (and that's putting it mildly), in supporting developers to recognise this important distinction. That SQL and SQL systems are, eurhm, significantly deficient (and that's putting it mildly), in their support for database constraints. That these latter two examples are a very good illustration of the importance of the difference between a model (Codd's RM) and its implementation (some particular SQL system). As far as relational database technology is concerned, the latters deviate ever more propostrously from the former.

And whatever else I forgot to remember.

Erwin Smout