I need to change the way I am storing information in the DB. Because the query works slow with the old model I had developed. 
General problem is following. 
1) I have list of courses, and each course has list of tags describing general content of the course. For instance, the course called "Database Management Systems" could have following tags {sql, index, key, relation}. 
2) I have professors who have tags also which generally describe what do they teach in their courses . For example, Barton {sql, php, apache,mysql}
I need to find all professors in the DB who match best to the specific selected course. Also I need to sort them by their weight of matching. 
Question
The question is how to store this information in the DB and how to process this stored information in order to solve this problem.
 This question appeared after I received a lot of negative critiques about my sql query here.
views:
134answers:
5Sounds like you should have the following tables:
- Course- lists the courses.
- Subject_area- a list of subjects that courses can cover, like "sql", "c++" etc.
- Course_content- cross reference table between- Courseand- Subject_area.
- Professor- lists the professors.
- Professor_expertise- cross reference table between- Professorand- Subject_area.
For example, you might have a Professor called "Prof. Brown" with a corresponding row in table Professor, and also subject areas called "sql", "java" and "optimisation algorithms" each of which Prof. Brown is interested in. Then there would be a corresponding row for each of those areas in Professor_expertise, and each one would reference Prof. Brown's row in table Professor, and also the appropriate row in table Subject_area.
Now suppose you have a course "SQL and Database Design", and it has subject areas "Database Design", "SQL", "Database Indexes", "Normalisation" and "Query optimisation". You could see which professors are suited to teach the course by issuing
SELECT
    Professor.Name,
    Professor.Id,
    MySubquery.NumMatches
FROM
    Professor
    JOIN (
        SELECT
            Professor,
            COUNT(*) AS NumMatches
        FROM
            Professor_expertise
        WHERE
            Subject_area_id IN (
                SELECT Course_content.Subject_area_Id
                FROM Course_content
                WHERE Course_content.Course_Id = x
            )
        GROUP BY
            Professor
    ) AS MySubquery
ORDER BY
    MySubquery.NumMatches DESC
where x is the ID number corresponding to the course.
Could you post the following:
- Your table structure
- Any indexes
- The slow running query
Well, I would start with something like these 5 tables:
Course (CourseID, CourseName, ...)
Professor (ProfID, ProfName, ...)
Tag (TagID, TagName)
CourseTag (CourseID, TagID)
ProfTag (ProfID, TagID)
and query it something like
SELECT ProfName, Count(PT.TagID) AS Weighting
FROM Professor P
INNER JOIN ProfTag PT ON P.ProfID = PT.ProfID
INNER JOIN CourseTag CT ON PT.TagID = CT.TagID
WHERE CT.CourseID = @SelectedCourse
GROUP BY ProfName
That's MS SQL Server syntax...don't know what you're using (but with php, probably not that :))
In the simplest form, I'd suggest having 5 tables:
tbl_CourseList
CourseId - int, PK, identity
CourseName - varchar(100)
tbl_CourseContent
ContentId - int, PK, identity
CourseId - int, FK
Type - varchar(25)
tbl_Professors
ProfessorId - int, PK, identity
ProfessorName - varchar(100)
tbl_ProfessorExpertise
ExpertiseId - int, PK, identity
ProfessorId - int, FK
ExpertiseType - varchar(25)
ExpertiseWeight - int
tbl_ProfessorCourses
CourseId
ProfessorId
Hopefully that is self-explanatory...
Here's what I suggest as your schema (primary keys bolded):
- courses table: idandname
- profs table: idandname
- tags table: idandname
- courseTags: tag_idandcourse_id(index ontag_idto speed up query)
- profTags: tag_idandprof_id(index ontag_idto speed up query)
Then you can so something like:
SELECT profs.id, COUNT(*) AS matches 
FROM profs, profTags, courseTags 
WHERE profs.id=profTags.prof_id 
    AND profTags.tag_id=courseTags.tag_id 
    AND courseTags.course_id=[COURSE ID] 
GROUP BY profs.id 
ORDER BY matches DESC;
This query returns a list of prof IDs, ordered by the number of tag matches they have.