Some options are:
- Write a function that strips non-alphanumeric and spaces and apply it to column
tracklist
in the search query. One big downside to this is your query won't be able to use any indices on tracklist
.
Use some form of fuzzy matching. This will also most likely not take advantage of indices. There's Tom Haigh's suggestion, which will find the row in your example but may also return false positives. You could also use REGEXP
/RLIKE
, which don't perform fuzzy matching, but you can construct an expression that does:
$tracklist=preg_replace('/[^a-z]+/i', '[^[=a=]]+', $_REQUEST['did']);
$q = mysqli_query($id,"SELECT * FROM `mixes` WHERE `tracklist` REGEXP '$tracklist'");
Using the example $_REQUEST['did']
of 'johnny hey johnnny', the resulting query is "SELECT * FROM mixes WHERE tracklist
REGEXP 'john[^[=a=]]+hey[^[=a=]]+johnnny'". The above ignores non-alphabetic characters. If you want to ignore non-alphanumeric characters, try one of the following (the second doesn't ignore '_'):
$tracklist=preg_replace('/[^a-z0-9]+/i', '[^[:alnum:]]+', $_REQUEST['did']);
$tracklist=preg_replace('/\W+/', '[^[:alnum:]_]+', $_REQUEST['did']);
Restructure the tables and processing logic. This is the most work but will produce the conceptually cleanest implementation. I'm guessing that tracklist
is a $delim (eg comma, semicolon, ...) separated list of track names. If so, mixes
isn't even first normal form. Create a new tracklist
table:
CREATE TABLE `tracklist` (
mix INTEGER REFERENCES mixes (id),
position INTEGER NOT NULL,
track INTEGER REFERENCES tracks (id),
PRIMARY KEY (mix, position),
KEY (mix, track),
KEY (track)
) ENGINE=InnoDB;
To find the mix for a given track, perform a join:
SELECT mixes.* FROM mixes
JOIN tracklist
ON mixes.id = tracklist.mix
WHERE tracklist.track=$trackid
Through the use of indices, the above query will be faster than the other two options. The main downside to this approach is you'll have to perform more queries to get the tracklist for a mix. An upside is it can improve tracklist editing. With appropriately defined indices, this option can still be faster than your current DB design. For example,
SELECT t2.mix, t2.position, tracks.*
FROM tracklist AS t1
JOIN tracklist AS t2
ON t1.mix = t2.mix
JOIN tracks
ON t2.track = tracks.id
WHERE t1.track=:trackid
will be quite efficient if there are indices on tracklist.tracks
, tracklist.mix
and tracks.id
, which is a very reasonable assumption, given the above definition of tracklist
and that track.id
is most likely a primary key column. MySQL only needs to examine (number of mixes containing track :trackid) + (total number of tracks in all mixes that contain :trackid) * 2 rows
. If there are two mixes containing "Johnny, Hey (johnnny)", with 8 and 10 tracks respectively, MySQL will examine 38 rows total (20 from tracklist
and 18 from tracks
). Options 1 and 2 will need to examine (number of mixes)
rows. As long as the number of mixes is much larger than the average mix size and the track you're looking for doesn't appear in many of the mixes, this option is faster than the other two.
Note that unless you sanitize $_REQUEST['did']
before constructing the query, your script is open to SQL injection ([2], [3]). The above example in option 2 that uses REGEXP
is safe (sanitization is a side-effect of the preg_replace
), but the safest, most general approach is to use prepared queries.
$q = mysqli_prepare($id,"SELECT * FROM `mixes` WHERE `tracklist` REGEXP ?");
$q->bind_param('s', $_REQUEST['did']);
$q->execute();
You can also reuse prepared queries, which means the DB won't need to reparse the query, thus improving speed.
function findMixFor($track) {
global $id;
static $q = Null;
if (is_null($q)) {
$q = mysqli_prepare($id,"SELECT * FROM `mixes` WHERE `tracklist` REGEXP ?");
}
$q->bind_param('s', $track);
$q->execute();
return $q;
}
Note that findMixFor
is just to illustrate reusing a prepared query. A proper implementation would isolate all the DB access in a data access layer and also handle errors.
You might also want to look into using PDO instead of mysqli. For one thing, it's support for prepared queries is much richer. For another, it makes it easy to switch databases.