views:

28

answers:

3

I have a table called "scholarships" that has a field named "majors" which contains comma separated major names for the majors the scholarship is related too.

Let's say the field could contain one (or more separated by commas) of the following: business, agribusiness, business administration, international business.

If someone is searching "business" as the major, how can I select "business" as a match but not the others?

The closest I've come is this but I know it could be better - my regex ability isn't so hot.

SELECT scholarship_id, scholarship_award_name, scholarship_majors 
FROM scholarships 
WHERE scholarship_majors rlike '[, ][[:<:]]business[[:>:]][, ]'
OR scholarship_majors rlike '^[[:<:]]business[[:>:]][, ]'
OR scholarship_majors rlike '[, ][[:<:]]business[[:>:]]$'

I'm trying to catch the field if it begins with "business" or ", business" or "business ", "business, " but not "business administration" etc...

Any advice?

+1  A: 

Any advice?

Don't store the data in comma separated lists - that's denormalized data, and besides being difficult to isolate specifics, is also prone to bad data (typos, cases sensitivity...).

  1. Define a MAJORS table:

    • MAJOR_ID (primary key)
    • MAJOR_NAME
  2. Use a many-to-many table to join scholarships to one or more majors:

    SCHOLARSHIP_MAJORS

    • SCHOLARSHIP_ID (primary key, foreign key to SCHOLARSHIPS table)
    • MAJOR_ID (primary key, foreign key to MAJORS table)
  3. Use JOINs to get scholarships based on majors:

    SELECT s.scholarship_id, 
           s.scholarship_award_name, 
           m.major_name
      FROM SCHOLARSHIPS s
      JOIN SCHOLARSHIP_MAJORS sm ON sm.scholarship_id = s.scholarship_id
      JOIN MAJORS m ON m.major_id = sm.major_id
     WHERE m.major_name IN ('a', 'b', 'c')
    

...if you want the majors output in a comma separated list, use the GROUP_CONCAT function:

    SELECT s.scholarship_id, 
           s.scholarship_award_name, 
           GROUP_CONCAT(m.major_name) AS majors
      FROM SCHOLARSHIPS s
      JOIN SCHOLARSHIP_MAJORS sm ON sm.scholarship_id = s.scholarship_id
      JOIN MAJORS m ON m.major_id = sm.major_id
     WHERE m.major_name IN ('a', 'b', 'c')
  GROUP BY s.scholarship_id, s.scholarship_award_name
OMG Ponies
A: 

I was able to improve the sql by disallowing alpha characters like:

SELECT scholarship_id, scholarship_award_name, scholarship_majors 
FROM scholarships 
WHERE scholarship_majors rlike '[, ][^a-z][[:<:]]business[[:>:]][^a-z][, ]'
OR scholarship_majors rlike '^[[:<:]]business[[:>:]][^a-z][, ]'
OR scholarship_majors rlike '[, ][^a-z][[:<:]]business[[:>:]]$'

This seems to capture in the way I'm hoping for!

Still looking for any advice to improve this SQL statement.

sitesbyjoe
A: 

I spent some time fighting with regexp since the database I'm dealing with has a couple CSV type fields.

Benchmarking revealed that this is a good way with easier syntax:

SELECT * FROM table WHERE FIND_IN_SET('string', my_field)

The field in question HAS to be a CSV string field. Perfect solution for my problem. Yes, I acknowledge the many to many tables are a more normalized way.

sitesbyjoe