I need to change the way I am storing information in the DB. Because the query works slow with the old model I had developed.
General problem is following.
1) I have list of courses, and each course has list of tags describing general content of the course. For instance, the course called "Database Management Systems" could have following tags {sql, index, key, relation}.
2) I have professors who have tags also which generally describe what do they teach in their courses . For example, Barton {sql, php, apache,mysql}
I need to find all professors in the DB who match best to the specific selected course. Also I need to sort them by their weight of matching.
Question
The question is how to store this information in the DB and how to process this stored information in order to solve this problem.
This question appeared after I received a lot of negative critiques about my sql query here.
views:
134answers:
5Sounds like you should have the following tables:
Course
- lists the courses.Subject_area
- a list of subjects that courses can cover, like "sql", "c++" etc.Course_content
- cross reference table betweenCourse
andSubject_area
.Professor
- lists the professors.Professor_expertise
- cross reference table betweenProfessor
andSubject_area
.
For example, you might have a Professor called "Prof. Brown" with a corresponding row in table Professor
, and also subject areas called "sql", "java" and "optimisation algorithms" each of which Prof. Brown is interested in. Then there would be a corresponding row for each of those areas in Professor_expertise
, and each one would reference Prof. Brown's row in table Professor
, and also the appropriate row in table Subject_area
.
Now suppose you have a course "SQL and Database Design", and it has subject areas "Database Design", "SQL", "Database Indexes", "Normalisation" and "Query optimisation". You could see which professors are suited to teach the course by issuing
SELECT
Professor.Name,
Professor.Id,
MySubquery.NumMatches
FROM
Professor
JOIN (
SELECT
Professor,
COUNT(*) AS NumMatches
FROM
Professor_expertise
WHERE
Subject_area_id IN (
SELECT Course_content.Subject_area_Id
FROM Course_content
WHERE Course_content.Course_Id = x
)
GROUP BY
Professor
) AS MySubquery
ORDER BY
MySubquery.NumMatches DESC
where x
is the ID number corresponding to the course.
Could you post the following:
- Your table structure
- Any indexes
- The slow running query
Well, I would start with something like these 5 tables:
Course (CourseID, CourseName, ...)
Professor (ProfID, ProfName, ...)
Tag (TagID, TagName)
CourseTag (CourseID, TagID)
ProfTag (ProfID, TagID)
and query it something like
SELECT ProfName, Count(PT.TagID) AS Weighting
FROM Professor P
INNER JOIN ProfTag PT ON P.ProfID = PT.ProfID
INNER JOIN CourseTag CT ON PT.TagID = CT.TagID
WHERE CT.CourseID = @SelectedCourse
GROUP BY ProfName
That's MS SQL Server syntax...don't know what you're using (but with php, probably not that :))
In the simplest form, I'd suggest having 5 tables:
tbl_CourseList
CourseId - int, PK, identity
CourseName - varchar(100)
tbl_CourseContent
ContentId - int, PK, identity
CourseId - int, FK
Type - varchar(25)
tbl_Professors
ProfessorId - int, PK, identity
ProfessorName - varchar(100)
tbl_ProfessorExpertise
ExpertiseId - int, PK, identity
ProfessorId - int, FK
ExpertiseType - varchar(25)
ExpertiseWeight - int
tbl_ProfessorCourses
CourseId
ProfessorId
Hopefully that is self-explanatory...
Here's what I suggest as your schema (primary keys bolded):
- courses table:
id
andname
- profs table:
id
andname
- tags table:
id
andname
- courseTags:
tag_id
andcourse_id
(index ontag_id
to speed up query) - profTags:
tag_id
andprof_id
(index ontag_id
to speed up query)
Then you can so something like:
SELECT profs.id, COUNT(*) AS matches
FROM profs, profTags, courseTags
WHERE profs.id=profTags.prof_id
AND profTags.tag_id=courseTags.tag_id
AND courseTags.course_id=[COURSE ID]
GROUP BY profs.id
ORDER BY matches DESC;
This query returns a list of prof IDs, ordered by the number of tag matches they have.