views:

134

answers:

5

I need to change the way I am storing information in the DB. Because the query works slow with the old model I had developed.
General problem is following.
1) I have list of courses, and each course has list of tags describing general content of the course. For instance, the course called "Database Management Systems" could have following tags {sql, index, key, relation}.
2) I have professors who have tags also which generally describe what do they teach in their courses . For example, Barton {sql, php, apache,mysql}
I need to find all professors in the DB who match best to the specific selected course. Also I need to sort them by their weight of matching.
Question
The question is how to store this information in the DB and how to process this stored information in order to solve this problem.
This question appeared after I received a lot of negative critiques about my sql query here.

+3  A: 

Sounds like you should have the following tables:

  • Course - lists the courses.
  • Subject_area - a list of subjects that courses can cover, like "sql", "c++" etc.
  • Course_content - cross reference table between Course and Subject_area.
  • Professor - lists the professors.
  • Professor_expertise - cross reference table between Professor and Subject_area.

For example, you might have a Professor called "Prof. Brown" with a corresponding row in table Professor, and also subject areas called "sql", "java" and "optimisation algorithms" each of which Prof. Brown is interested in. Then there would be a corresponding row for each of those areas in Professor_expertise, and each one would reference Prof. Brown's row in table Professor, and also the appropriate row in table Subject_area.

Now suppose you have a course "SQL and Database Design", and it has subject areas "Database Design", "SQL", "Database Indexes", "Normalisation" and "Query optimisation". You could see which professors are suited to teach the course by issuing

SELECT
    Professor.Name,
    Professor.Id,
    MySubquery.NumMatches
FROM
    Professor
    JOIN (
        SELECT
            Professor,
            COUNT(*) AS NumMatches
        FROM
            Professor_expertise
        WHERE
            Subject_area_id IN (
                SELECT Course_content.Subject_area_Id
                FROM Course_content
                WHERE Course_content.Course_Id = x
            )
        GROUP BY
            Professor
    ) AS MySubquery
ORDER BY
    MySubquery.NumMatches DESC

where x is the ID number corresponding to the course.

Hammerite
A: 

Could you post the following:

  • Your table structure
  • Any indexes
  • The slow running query
Abe Miessler
+4  A: 

Well, I would start with something like these 5 tables:

Course (CourseID, CourseName, ...)
Professor (ProfID, ProfName, ...)
Tag (TagID, TagName)
CourseTag (CourseID, TagID)
ProfTag (ProfID, TagID)

and query it something like

SELECT ProfName, Count(PT.TagID) AS Weighting
FROM Professor P
INNER JOIN ProfTag PT ON P.ProfID = PT.ProfID
INNER JOIN CourseTag CT ON PT.TagID = CT.TagID
WHERE CT.CourseID = @SelectedCourse
GROUP BY ProfName

That's MS SQL Server syntax...don't know what you're using (but with php, probably not that :))

joelt
I use mysql as DB Management System
Bakhtiyor
A: 

In the simplest form, I'd suggest having 5 tables:

tbl_CourseList

CourseId - int, PK, identity

CourseName - varchar(100)

tbl_CourseContent

ContentId - int, PK, identity

CourseId - int, FK

Type - varchar(25)

tbl_Professors

ProfessorId - int, PK, identity

ProfessorName - varchar(100)

tbl_ProfessorExpertise

ExpertiseId - int, PK, identity

ProfessorId - int, FK

ExpertiseType - varchar(25)

ExpertiseWeight - int

tbl_ProfessorCourses

CourseId

ProfessorId

Hopefully that is self-explanatory...

Reagan Williams
+1  A: 

Here's what I suggest as your schema (primary keys bolded):

  • courses table: id and name
  • profs table: id and name
  • tags table: id and name
  • courseTags: tag_id and course_id (index on tag_id to speed up query)
  • profTags: tag_id and prof_id (index on tag_id to speed up query)

Then you can so something like:

SELECT profs.id, COUNT(*) AS matches 
FROM profs, profTags, courseTags 
WHERE profs.id=profTags.prof_id 
    AND profTags.tag_id=courseTags.tag_id 
    AND courseTags.course_id=[COURSE ID] 
GROUP BY profs.id 
ORDER BY matches DESC;

This query returns a list of prof IDs, ordered by the number of tag matches they have.

Ben S