tags:

views:

429

answers:

6

I've embarked on a project which is encouraging me to expand my knowledge of SQL. I've already learned quite a bit, but I've gotten myself to a point where I can see a problem, but I don't know enough to properly research a solution. I've hit SO, Google, and the MySQL docs, but either I'm asking the wrong questions or I don't know how to ask the right ones. My data is structured with three primary tables, one of which has M:N relationships with the other two. I'm storing those relationships in another pair of tables, as I'm under the impression that's the "best way":

books (book_id INT PRIMARY KEY, book_title VARCHAR)
authors (author_id INT PRIMARY KEY, author_name VARCHAR)
subjects (subject_id INT PRIMARY KEY, subject_name VARCHAR)
book_authors (book_id INT, author_id INT)
book_subjects (book_id INT, subject_id INT)

(The first three tables actually have more than two columns, but they aren't relevant.)

Edit:

Obviously, I suck at asking clear questions, but then I already knew that. :-)

The issue I'm trying to solve is how to most effectively/efficiently get the data from the database into my app. Once I have it there, I can rearrange it however I need, and I trust I can do that regardless of how the data is "shaped" coming out of the database. It would be trivial to do with five separate SELECT * FROM … statements and it would be trivial to do with with the cross-product "frankenjoin" I had posted earlier. However, I know enough about SQL to spot that the former robs the database engine of the opportunity to do its job. I don't know enough about SQL to say whether the latter is equally bad.

So how about this: instead of asking what's wrong with the solution I came up with, what would be your solution? If you had data related this way, how would you select it out of the database? (And for what reasons?) Would you perhaps even arrange the tables differently?

+2  A: 

What do you think should come back? You can't have lists inside a tuple coming back from a query, so your current results seems like the only way to get back all the data in one query.

UPDATE: SQL doesn't support hierarchical structures, so you'll have to build that yourself (assuming you aren't using an ORM tool that can do it for you). One way would be to make a trip to the DB to get all the books to process, then for each book make one trip for the authors of that book and one trip for the subjects of that book. That can be an awful lot of DB trips, though. Another way would be to bring back all the data (as in your question) and then build a hierarchical structure out of that. Which one is better depends on a lot of factors. I would probably go for the one DB trip solution, even though the client code is probably more complicated; the general rule is that roundtrips to the database are expensive, so I'd tend toward the solution that doesn't increase the number of such trips linearly with the number of results.

Hank Gay
+1: Without a clear concept of the expected results, the SQL cannot be written.
S.Lott
A large part of the problem is that I don't *know* what the results should look like, exactly. I think I'm trying to fit a hierarchical peg into a square hole. It just doesn't seem that having *everything* duplicated multiple times can possibly be the most efficient solution.
Ben Blank
It is. If we can't collectively imagine what the format should be otherwise. then that's what we're left with (as I would have expected, since that's the native simple single query output.)
le dorfier
+3  A: 

I think you probably want to do it as two queries, one for authors and one for subjects, however you could combine these into a single result using a UNION if you wanted.

SELECT books.book_id, books.book_title, 
      'author' as record_type, authors.author_name as record_value
LEFT JOIN book_authors ON books.book_id = book_authors.book_id
LEFT JOIN authors ON authors.item_id = book_authors.author_id
UNION
SELECT books.book_id, books.book_title,
       'subject' as record_type, subjects.subject_name as record_value
LEFT JOIN book_subjects ON books.book_id = book_subjects.book_id
LEFT JOIN subjects ON subjects.subject_id = book_subjects.subject_id;

I'm not sure that it really saves anything other than a round-trip to the database. I offer it only for the potential contribution to your knowledge not in hopes that it is helpful to your immediate problem.

tvanfosson
Huh, I've never actually seen a UNION before. I'll have to play around with this. :-)
Ben Blank
+1- You can put this sucker in a view, and then play around with whether it's faster to have the DB evaluate the record_type value, or to do it yourself in code.
Mike Sickler
A: 

You're getting exactly what you described, in the fashion any standard mainstream developer would accomplish the same thing. But apparently it doesn't match your mental expectation as to shape and size.

Start by taking 3 books, with varying numbers of authors and subjects, and just list what you would like to see. Then we can discuss how to get there.

I don't think you'll find any square grid that gets you there. Probably you want something more like a tree structure?

+ Title 1
+----+ Authors
     + ---- Author 1
     + ---- Author 2
+    + Topics
     + ---- Topic A
etc.

Anything along these lines I find is easiest to derive using mapping tools in another abstraction layer; although there are ways to twist relational data that can help, if you have the skills (e.g. analytical cubes and queries, returning dependent secondary queries, etc.) They usually involve proprietary extensions (e.g. LINQ); but you probably don't want to go to the more sophisticated (read: complex) stuff until you've thoroughly mastered stuff like this.

le dorfier
Yes, I'll be linking the records together into a hierarchy once I have them out of the database, but that's pretty easily adapted to however the data is shaped coming from the query. I'm just looking for the most effective/efficient way to select the data in the first place.
Ben Blank
A: 

I don't typically find it useful to get data back shaped this way. Multiple resultsets that I can relate to each other tend to be more useful. You won't have extraneous data in your resultset this way either (8 rows returned each with the same columns for the book).

Jim Petkus
A: 

You can return multiple result sets with MS SQL Server. SELECT * FROM books; SELECT * FROM book_authors; SELECT * FROM authors; SELECT * FROM book_subjects; SELECT * FROM subjects;

Then use the information to either load a .NET DataSet with relationships or create Entity Objects that encapsulate the relationships if not using .NET or DataSets.

If you are using .NET 3.5 then look at LINQ to SQL or LINQ to Entities to simplify the effort in creating the required code.

Matthew
+1  A: 

Creating some huge multi-joined table and refactoring that in your app defeats the purpose of having the database in the first place. You've got everything nicely normalized in your tables, why would you want to throw all that effort away?

What I would do is establish and maintain a connection to your database. When you need some data, create a select statement that joins only the tables you need and gives you only the data you need at that moment, send that to the database, and work with the results. If someone else is inserting and updating rows in the database while you're querying it, your results will be up to date.

If you need all the books by Emily Dickinson, then join your books and authors table. If you need all the authors that wrote books on pastry making, just join the authors and subjects table.

eaolson