ansaurus

Question

What is the best way to access data in a normalized database schema?

Answer 1

A:

The join queries will ask the database to put the tables together, matching the ids and return a single table. That way the data can be dynamically configured to the current task, something that non normalized databases cannot do.

Phil 2009-02-08 18:04:54

Answer 2

+2 A:

I have difficulty seeing your point. What exactly do you mean by "how do you grab multiple albums at once in a single query"? What exactly do you have difficulties with?

Intuitively I would say:

SELECT
  a.aid    album_id,
  a.name   album_name,
  s.sid    song_id,
  s.name   song_name,
  s.length song_length
FROM
  albums a
  INNER JOIN songs s ON a.aid = s.aid
WHERE
  a.aid IN (1, 2, 3)

and

SELECT
  a.aid         album_id,
  a.name        album_name,
  COUNT(s.sid)  count_songs,
  SUM(s.length) sum_length   /* assuming you store an integer seconds value  */
FROM                         /* here, not a string containing '3:18' or such */
  albums a
  INNER JOIN songs s ON a.aid = s.aid
WHERE
  a.aid IN (1, 2, 3)
GROUP BY
  a.aid

Depending on what you want to know/display. Either you query the database for aggregate information, or you calculate it yourself out of the query result #1 in your app.

Depending on how much data is cached in your app, and how long queries take the one strategy can be faster than the other. I would recommend querying the DB, though. DBs are made for this kind of stuff.

Tomalak 2009-02-08 18:14:23

I see your point, but I have issues with the first query, because you end up with a lot of repeated data - the album name is repeated many times.I'm trying to have my cake and eat it, too - I want the data to be as compact as possible, but that's not realistic without aggregates.

Daniel Lew 2009-02-08 18:28:58

Leave off the album name from the first query. You have it in the second one (which probably comes first anyway), and your app can store some context as well. Other than that, I see your point as well. But I guess the repeated album name won't clog your performance too badly. ;-)

Tomalak 2009-02-08 18:39:49

(Funnily enough I rephrased my second paragraph before posting to avoid the "you can't have your cake and eat it too" platitude :-D)

Tomalak 2009-02-08 18:52:13

Answer 3

A:

SELECT aid,GROUP_CONCAT(sid) FROM songs GROUP BY aid; 

+----+-------------------------+
|aid | GROUP_CONCAT(sid)       |
+----+-------------------------+
|  3 | 5,6,7                   |
+----+-------------------------+

Lance Kidwell 2009-02-08 23:23:26

My googling suggests that GROUP_CONCAT() is not supplied by PostgreSQL. However you can build it yourself using CREATE AGGREGATE.

j_random_hacker 2009-02-11 08:29:47

Yes, that's true. I didn't notice the PostgreSQL part of the question.

Lance Kidwell 2009-02-11 15:52:07

Answer 4

+2 A:

I see your point, but I have issues with the first query, because you end up with a lot of repeated data - the album name is repeated many times. I'm trying to have my cake and eat it, too - I want the data to be as compact as possible, but that's not realistic without aggregates.

Ah, I understand your question now. You're asking how best to micro-optimize something that's actually not very expensive for most cases. And the solution you're toying with is actually going to be significantly less efficient than the "problem" it's trying to solve.

My advice would be to join the tables and return the columns you need. For anything less than 10,000 records returned, you won't notice any significant wire time penalty for handing back that AlbumName with each Song record.

If you notice it performing slowly in the field, then optimize it. But keep in mind that a lot of smart people have spent about 50 years of research making the "join the tables & return what you need" solution fast. I doubt you'll beat it with your home-rolled string concatenation/de-concatenation strategy.

Jason Kester 2009-02-09 20:01:11

This is an example. The actual Albums table will have approximately 10 columns that I'll want, and that's a lot of repeated data. I'm going with two queries instead.Also, no need to be condescending. I know that string concat/de-concate would be slow, which is why I posted the question. :P

Daniel Lew 2009-02-10 15:55:34

You should also know that returning two recordsets will be slow. Certainly not worth it to avoid repeating 10 columns a few hundred times. Sorry if I sounded condescending. This seems to be new ground for you, and it's DB 101.

Jason Kester 2009-02-10 20:52:18

Answer 5

+1 A:

I agree with Jason Kester insofar as I think this is unlikely to really be a performance bottleneck in practice, even if you have 10 columns with repeated data. However, if you're bent on cutting out that repeated data then I'll suggest using 2 queries:

Query #1:

SELECT sid, length     -- And whatever other per-song fields you want
FROM songs
ORDER BY aid

Query #2:

SELECT aid, a.name, COUNT(*)
FROM albums a
JOIN songs s USING (aid)
GROUP BY aid, a.name
ORDER BY aid, a.name

The second query enables you to break up the output of the first query into segments appropriately. Note that this will only work reliably if you can assume that no changes will be made to the table between these two queries -- otherwise you'll need a transaction with SET TRANSACTION ISOLATION LEVEL SERIALIZABLE.

Again, the mere fact that you're using two separate queries is likely to make this slower overall as in most cases the doubled network latency + query parsing + query planning is likely to swamp the effective increase in network throughput. But at least you won't have that nasty horrible feeling of sending repeated data... :)

j_random_hacker 2009-02-11 08:45:40

Answer 6

A:

I wouldn't break your normalisation for that. Leave the tables normailsed and then use the following to query - http://stackoverflow.com/questions/43870/how-to-concatenate-strings-of-a-string-field-in-a-postgresql-group-by-query

Guy C 2009-02-15 11:32:14

ansaurus

tags:

views:

answers:

What is the best way to access data in a normalized database schema?

related questions