tags:

views:

82

answers:

3

I have a MySQL database with 3 tables:

  1. The main table of records called "tracks" (as in music)
  2. A tags table called "tags"
  3. A join table for the two called "taggings"

The tags table is basically a list of genres, which is pre-defined. A track can then have 1 or more tags (via the join table).

The idea is, that the user checks off the genres (tags) for which he or she want to find tracks. But I would also like the interface to reflect which tags are no longer "useful", that is, tags which are complementary to the currently selected ones.

Edit: What I missed was that I need to find tags that are complementary to the currently selected set of tags. See my comment below.

Example: the user selects the "rock" and "pop" tags, and is shown a list of tracks that match "rock" + "pop". But suppose there are no tracks in the database that also match "jazz". In that case, I'd like to disable the "jazz" tag in the interface, because "rock" + "pop" + "jazz" would give zero results.

Is there a clever way to do this with MySQL?

+1  A: 
select 
   tagid 
from 
   taggings 
where 
   trackid in ([list of, or subquery for your selected tracks])

Disable all the tags, unless the result contains their id. Or disable all tags, then re-enable those that are returned by this query. You could also do some restructuring and convert it to a 'not in' query, but that would normally be slower.

krdluzni
I think the point is to have the controls enabled and allowing the user to click them.
scottm
Well, yes, the user should be able to click the tag-controls that "make sense". Whether the more efficient thing is to disable all and then enable some, or to enable all and then disable some, I don't know. Depends on what the most efficient query is; find tags are used together with the curent ones, or find those that are _not_ used with the current ones.
Flambino
'Not In' queries are known to be inefficient since they usually require a full scan of the table, whereas 'In' only needs to search until the first hit. 'Not in' is easier to read and make sense of, but less efficient. How you handle the processing of that data after retrieval, I can't say without knowing more about your interface design.
krdluzni
Thanks for the tip - I'll avoid the NOT IN condition. As for the processing of the data, that's pretty straight forward so long as I can get some tag-ids from the database in a good way.
Flambino
A: 

This probably isn't the most efficient:

SELECT 
    TagId --to disable
FROM Tags 
WHERE TagId NOT IN(
    SELECT Distinct TagId FROM Taggings WHERE TrackId IN (
     SELECT TrackId FROM Taggings WHERE TagId in (1, 2)
    ) 
)
  1. Get all tracks matching currently selected tags.
  2. Take all tags assigned to those tracks
  3. Disable tags not in this list

Edit
What about this:

SELECT 
    DISTINCT TagId 
FROM 
    Taggings
GROUP BY 
    TagId,
    TrackId HAVING TrackId IN (
     SELECT 
      TrackId
     FROM 
      Taggings
     WHERE
      TagId in ( 1, 2)
    )

This should return all tags that should be enabled.

scottm
Hmmm.. that crashed my MySQL-client :-)Trouble is, that I have thousands of tracks, and I need the interface to be really responsive. While your answer probably would work, it seems too brute force, so to speak.I'm thinking, that it should be possible to find the tag_ids that are used together, simply from the -- short -- list of selected tags, rather than a list of thousands of track_ids.
Flambino
Yeah, I figured it was going to be a little rough. I don't think you'll be able to do it with just the tags. I do think you have to have the tracks to know what the remaining tag possibilities are.
scottm
Yeah, you're probably right. But now I'm thinking of adding another table with the different tag combinations for a quicker lookup. I'll let this question sit here for a while though and see what happens. But thanks for the input so far!
Flambino
@Flambino: That sparked off an idea... coming soon.
krdluzni
@krdluzni: Looking forward to it@scottm: Thanks, that runs OK, but it's still a little slow. 1.63 seconds for 1000 tracks with 16 possible tags.To the both of you, thanks for your help, but there's a friend pointed out a nasty complication - see my comment up above by the original question.
Flambino
A: 

Do you have control over how tags are added to the tracks? If you can hook into that, you can create some extra metadata.

Essentially a many-to-many join between tags. Every time you add a tag, you add a record to this join table containing an old-tag/new-tag pair (with the lower id as the first value to reduce duplication), for every one of the old tags on that track. IIRC, there's a way to set up the table to ignore duplicate inserts, rather than throwing errors. You'll also have to manage this table during deletion of tags, which will be more time-consuming, but that's probably a rare event.

With the above in place you have a simple query to identify tags that are still relevant:

select 
   tag_a
from
   related_tags
where
   tag_b in ([tags_already_in_search])
union
select
   tag_b
from
   related_tags
where
   tag_a in([tags_already_in_search])

This solution essentially shifts some of the processing time to the point in time where tags are added and removed.

krdluzni
Interesting! I'll try this out. And yes, I was thinking about creating a 2-column lookup table exactly like you describe, precisely because it'd move the server load to a different point in time, and because I have total control over when tags are added/removed/changed.
Flambino