views:

237

answers:

4

So I've read a couple books on SQL and I've used SQL and created databases, but really nothing advanced, a couple tables, a couple joins, that sort of thing.

So now I am working on a project that is pretty big. The database is going to have multiple tables and we will be processing 40'000 to 80'000 rows of data daily.

Now I find myself lacking some basic SQL concepts, for instance:

I have a dataset with files and file details. The files are listed in one table and the file details are listed in another. We also have a build table and fourth table that joins the build table, file table and file details table together.

The problems are like, "How do I process the files and have the correct file details be associated with files and associated with a build?

Even my university didn't cover this and now I feel I am very out my depth.

Lets take one more example. Lets take the File Table. In my application I have 40'000 files, I need to add files that don't already exist to the file table. Do I have to search for each individual entry and add if it does not exist and skip if it does?

So my real question is, where is a good place to read up on general concepts?

+4  A: 

The best books I've found on the subject are the "SQL for Smarties" series by Joe Celko. They cover the basics starting from normalization all the way to how to represent different data structures in SQL.

LPCRoy
http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/406764#406764
Otávio Décio
+1 Celko is strongly opinionated, and not everyone agrees with everything he says. Read his books anyway. They're helpful if you take them with a grain of salt. He'll force you to think about a lot of stuff you need to think about.
Bill Karwin
+1  A: 

To answer your final question first, if you want to learn more about SQL concerpts there are a number of very good books. As LPCRoy mentioned, just anything from Joe Celko is good. Also, for set theory in particular there is a book called "Applied Mathematics for Database Professionals" that I strongly recommend. Also, do not neglect the forums. I am a big fan of the forum and articles at SqlServerCentral.com personally.

Now, your more specific questions were about interacting with the file system and processing files (or at least lists of file names) from the database. Most RDBMS have ways to do that in stored procedures, but exactly how depends on which system you are using. Also, even in the ones that would permit you to do that, I would not recommend it.

I would instead recommend using a different programming language such as Python or C# to parse your files and then enter the needed data into your database. Note that some newer RDBMS' such as MS Sql Server 2008 permit you to directly link documents to the database or even store documents inside the database, if you want. Even if that is your ultimate goal you are probably best using an external language such as Python or C# to insert it into the database.

TimothyAWiseman
A: 

The question you asked seems to have a simple join query solution and maybe a little bit of querying. Do you intend to do these operations in PL/SQL block or in code? Are you looking for some code-db interaction concepts or purely database level?

I usually find head first series a informative and a fun way of learning. http://headfirstlabs.com/books/hfsql/

Priyank
A: 
  • SQL and Relational Theory - How to Write Accurate SQL Code, by Chris J. Date; from the Preface:

    SQL is ubiquitous. But SQL is hard to use: It's complicated, confusing, and error prone—much more so, I venture to suggest, than its apologists would have you believe. In order to have any hope of writing SQL code that you can be sure is accurate, therefore (meaning it does exactly what it's supposed to do, no more and no less), you must follow some appropriate discipline—and it's the thesis of this book that using SQL relationally is the discipline you need. But what does this mean? Isn't SQL relational anyway?

    Well, it's true that SQL is the standard language for use with relational databases—but that fact in itself doesn't make it relational. The sad truth is, SQL departs from relational theory in all too many ways; duplicate rows and nulls are two obvious examples, but they're not the only ones. As a consequence, it gives you rope to hang yourself with, as it were. So if you don't want to hang yourself, you need to understand relational theory (what it is and why); you need to know about SQL's departures from that theory; and you need to know how to avoid the problems they can cause. In a word, you need to use SQL relationally. Then you can behave as if SQL truly were relational, and you can enjoy the benefits of working with what is, in effect, a truly relational system.

    Now, a book like this wouldn't be needed if everyone was using SQL relationally already—but they aren't. On the contrary, I observe much bad practice in current SQL usage. I even observe such practice being recommended, in textbooks and similar publications, by writers who really ought to know better (no names, no pack drill); in fact, a review of the literature in this regard is a pretty dispiriting exercise. The relational model first saw the light of day in 1969, and yet here we are, almost 40 years on, and it still doesn't seem to be very well understood by the database community at large. Partly for such reasons, this book uses the relational model itself as an organizing principle; it explains various features of the model in depth, and shows in every case how best to use SQL to implement the feature in question. ...

  • From there, if you want to deepen the subject, Applied Mathematics for Database Professionals, by Lex de Haan (RIP) and Toon Koppelaars; from the Forward, by Hugh Darwen and Chris J. Date (again):

    We welcome this contribution to the database literature. It is another book on the theory and practice of relational databases, but this one is interestingly different. The bulk of the book is devoted to a treatment of the theory. The treatment is not only rigorous and mathematical, but also rather more approachable than some other texts of this kind. The authors clearly recognize, as we do, the importance of logic and mathematics if database study is to be taken seriously. They have done a good job of describing a certain formalism developed by their former teachers, Bert de Brock and Frans Remmen. This formalism includes some ideas that will be novel to many readers, even those who already have a degree of familiarity with the subject. A particularly interesting novel idea, to us, is the formalization of updating and transactions in Chapter 10. ...

  • Practical Issues in Database Management: A Reference for the Thinking Practitioner, by Fabian Pascal:

    Preface

    The computer industry —and its database sector in particular— resembles the fashion industry: it is driven by fads. And more often than not, vendors profit from accelerated obsolescence on which fads are predicated. It's the users, however, not the vendors, who pay trough the nose. The vendors, helped by the trade media, can profitably exploit ignorance and obscure serious product deficiencies and questionable practices they induce by simply luring users to the next fad—the Internet being just the latest one. ... [My note: The book was published in 2000, but there is no sign that the WWW in particular is advanced beyond its fad functioning mode.]

  • An Introduction to Relational Database Theory, by Hugh Darwen, this one is free, it can be downloaded directly from its site.

MaD70