tags:

views:

146

answers:

3

So I have this website that has a search feature which searches a table in my mysql database. The database at the moment has 1108 rows. It contains music info such as Artist and Album. Since its possible for every character to be in an artist name or album name, I've urlencoded each of those variables before being added to the database. See below:

$artist = urlencode($_POST['artist']);
$album = urlencode($_POST['album']);

So now lets pretend that I have added a new entry to the database and it contain characters that needed to be urlencoded. The database shows it fine.

Now I want to go search.

Foreign characters worked. You can see here: http://albumarrrt.net/details.php?artist=Ai%20Otsuka clicking the album link for each one works.

But now a few problems occur.

1 - If you search for '&' the search reads the %26 as nothing. It shows %26 in the address bar, but it reads it as nothing. Here is how it is being read:

$search = $_GET['search'];

if($search == '') {
    echo "Please enter a search term :(";
}

That is the only thing done with $search before it starts getting read by the database.

2 - If you search for a single or double quotes, it does some weird stuff example:

Search for " and get No matches found for "%5C%5C%26quot%3B" Search for ' and get No matches found for "%5C%5C%26%23039%3B"

I don't understand why it does this, because the database only contains the code for the quote and nothing else.

Those are the only two things I have found wrong with my search. Maybe I have just been looking at it too long and can't figure it out, but it perplexes my why it doesn't read '&' as anything.


Onto my last question. My current searching method separates each word and adds %'s around it and then uses the LIKE statement to find matches. example:

Search: A bunch of Stuff (word) the mysql query would be like:

SELECT * FROM TABLE WHERE (album LIKE '%A%' AND album LIKE '%bunch%' AND album LIKE '%of%' AND album LIKE '%Stuff%' AND album LIKE '%%28word%29%') OR (artist LIKE '%A%' AND artist LIKE '%bunch%' AND artist LIKE '%of%' AND artist LIKE '%Stuff%' AND artist LIKE '%%28word%29%')

Obviously this is putting a lot of strain on the server, and I know using LIKE statements for such large database searching is a bad idea, so what would be an alternate way of searching FULL TEXT or some other method?

Sorry for the overwhelming amount of questions, But they all sorta go hand-in-hand with each other.


edit: Ok I've fixed my database up, but still have a few questions. Someone suggested to convert my text from utf8 to plain utf, how would I do this?

and I am still getting the problem with the & sign. for example: if you search for & on google it works, however on my site, my POST result for the search query reveals nothing when searching for &.

+1  A: 

I don't see why you need to urlencode, I would simply use mysql_real_escape_string.

'&' is a separator in a url so it won't be passed to your script unless you urlencode it first

Another problem with urlencode is the large number of extra characters. mySQL may silently truncate the artist or title if you haven't allowed for enough characters.

DC

DeveloperChris
+1  A: 

are you sure you don't want to be decoding the things coming from your URL's (and POSTS) before placing them in the database? If I were storing various strings, I would want to decode them to plain UTF or something and store them that way. Then I would re-encode them to display them. This might solve your search problem in and of itself.

Second, to speed up strings search access, you could create a strings table with all of your strings tokenized, and linked back to the strings that contain them. Then instead of doing a "like %$1%" you can say where $1 = stringTable.String and join against that ID. By no means count this as the optimal solution as I haven't done those performance tunes myself, it's just a suggestion.

Zak
+2  A: 
  • First: don't urlencode data in the database. Urlencode data after you fetch it, as you output to HTML.

  • Second: do use query parameters when you use user-supplied values in SQL queries. Then you don't have to worry about quotes in the form data causing syntax errors or SQL injection risks.

  • Third: don't use the LIKE '%pattern%' hack; instead use a real fulltext search solution instead (either FULLTEXT or Lucene/Solr or Sphinx Search). It'll have performance hundreds or thousands of times better than using ad-hoc text searching (depending on your volume of data).

    See the presentation I did for MySQL University: Practical Full Text Search in MySQL.

Bill Karwin
+1 for FULLTEXT/Lucene mention
AJ
I am having some trouble with FULLTEXT searching SELECT * FROM TABLE WHERE MATCH(artist, album) AGAINST('underworl');I noticed that text has to match exactly. example above, "underworl" won't return results where there is an artist "underworld". Is this how it is suppose to be? Because that puts a damper on things if so. If not, how can I fix it?
Scotta
Yes, it matches whole words. Other technologies support stemming so "underworld" matches "underworlds" but matching against arbitrary substrings is not usually part of the solution. You may have to stick with `LIKE` predicates or else use an *inverted index*.
Bill Karwin