views:

176

answers:

8

The following question is regarding the speed between selecting an exact match (example: INT) vs a "LIKE" match with a varchar.

Is there much difference? The main reason I'm asking this is because I'm trying to decide if it's a good idea to leave IDs out of my current project.

For example Instead of:

http://mysite.com/article/391239/this-is-an-entry

Change to:

http://mysite.com/article/this-is-an-entry

Do you think I'll experience any performance problems on the long run? Should I keep the ID's?

Note:

I would use LIKE to keep it easier for users to remember. For example, if they write "http://mysite.com/article/this-is-an" it would redirect to the correct.

Regarding the number of pages, lets say I'm around 79,230 and the app. is growing fast. Like lets say 1640 entries per day

+4  A: 

An INT comparison will be faster than a string (varchar) comparison. A LIKE comparison is even slower as it involves at least one wildcard.

Whether this is significant in your application is hard to tell from what you've told us. Unless it's really intensive, ie. you're doing gazillions of these comparisons, I'd go with clarity for your users.

Another thing to think about: are users always going to type the URL? Or are they simply going to use a search engine? These days I simply search, rather than try and remember a URL. Which would make this a non-issue for me as a user. What are you users like? Can you tell from your application how they access your site?

dave
But how fast? that's the question.
kuroir
Very fast... :) If you want be more accurate - it depends of machine, etc.
hsz
The main problem, is that the application is actually growing faster than expected. So at one point it'll have lets say 1,000,000 entries and still grow more and more. Maybe I should keep ints?
kuroir
Definitely you should.
hsz
Int is the only scalable option I'd say.
Skilldrick
+2  A: 

Firstly I think it doesn't really matter either way, yes it will be slower as a LIKE clause involves more work than a direct comparison, however the speed is negligible on normal sites.

This can be easily tested if you were to measure the time it took to execute your query, there are plenty of examples to help you in this department.

To move away slighty from your question, you have to ask yourself whether you even need to use a LIKE for this query, because 'this-is-an-entry' should be unique, right?

SELECT id, friendly_url, name, content FROM articles WHERE friendly_url = 'this-is-an-article';
ILMV
I would use LIKE to keep it easier for users to remember. For example, if they write "http://mysite.com/article/this-is-an" it would redirect to the correct.
kuroir
So what about when you have in your database: `this-is-an-article` and `this-is-an-unwanted-page` ?
hsz
I doubt very much that a user will simply type a URL from memory, most are picked up through Google or the users bookmarks.
ILMV
hsz +1, sure he can display a list of all matching articles, but that would pose a small yet significant usability problem.
ILMV
@hsz there are ways for disambiguation. @ILMV that's what I'm thinking really, not many will remember any part of the URL..
kuroir
@kuroir Of course, but in the @dave's case it will just learn users bad habbits to access elements with a names that do not exist.
hsz
I would simply advise keeping your int and usin the string part of the URL for SEO.
ILMV
+1  A: 

INT is much more faster.

In the string case I think you should not select query with LIKE but just with = because you look for this-is-an-entry, not for this-is-an-entry-and-something.

hsz
A: 

If you put an index on the varchar field it should be ok (performance wise), really depends on how many pages you are going to have. Also you have to be more careful and sanitize the string to prevent sql injections, e.g. only allow a-z, 0-9, -, _, etc in your query.

I would still prefer an integer id as it is faster and safer, change the format to something nicer like: http://mysite.com/article/21-this-is-an-entry.html

row1
A: 

As said, comparing INT < VARCHAR, and if the table is indexed on the field you're searching then that'll help too, as the server won't have to create a manual index on the fly.

One thing which will help validate your queries for speed and sense is EXPLAIN. You can use this to show which indexes your query is using, as well as execution times.

To answer your question, if it's possible to build your system using exact matches on the article ID (ie an INT) then it'll be much "lighter" than if you're trying to match the whole url using a LIKE statement. LIKE will obviously work, but I wouldn't want to run a large, high traffic site on it.

hfidgen
+1  A: 

A "SELECT * FROM x WHERE = 391239" query is going to be faster than "SELECT * FROM x WHERE = 'some-key'" which in turn is going to be faster than "SELECT * FROM x WHERE LIKE '%some-key%'" (presence of wild-cards isn't going to make a heap of difference.

How much faster? Twice as fast? - quite likely. Ten times as fast? stretching it but possible. The real questions here are 1) does it matter and 2) should you even be using LIKE in the first place.

1) Does it matter I'd probably say not. If you indeed have 391,239+ unique articles/pages - and assuming you get a comparable level of traffic, then this is probably just one of many scaling problems you are likely to encounter. However, I'd warrant this is not the case, and therefore you shouldn't worry about a million page views until you get to 1 million and one.

2) Should you even be using LIKE No. If the page/article title/name is part of the URL "slug", it has to be unique. If it's not, then you are shooting yourself in the foot in term of SEO and writing yourself a maintanence nightmare. If the title/name is unique, then you can just use a "WHERE title = 'some-page'", and making sure the title column has a unique index on.

Edit

You plan of using LIKE for the URL's is utterly utterly crazy. What happens if someone visits

yoursite.com/articles/the

Do you return a list of all the pages starting "the" ? What then happens if:

Author A creates

yoursite.com/articles/stackoverflow-is-massive

2 days later Author B creates

yoursite.com/articles/stackoverflow-is-massively-flawed

Not only will A be pretty angry that his article has been hi-jacked, all the perma-links he may have been sent out will be broken, and Google is going never going to give your articles any reasonable page-rank because the content keeps changing and effectively diluting itself.

Sometimes there is a pretty good reason you've never seen your amazing new "idea/feature/invention/time-saver" anywhere else before.

iAn
+1  A: 

There are a few things to consider:

The type of search performed on the DataBase will be an "index seek", search for single row using an index, most of the time.

This type of exact match operation on a single row is not significantly faster using ints than strings, they are basically the same cost, for any practical purpose.

What you can do is the following optimization, search the database using a exact match (no wildcards), this is as fast as using an int index. If there is no match do a fuzzy search (search using wildcards) this is more expensive, but on the other hand is more rare and can produce more than one result. A form of ranking results is needed if you want to go for best match.

Pseudocode:

  • Search for an exact match using a string: Article Like 'entry'
  • if (match is found) display page
  • if (match is not found) Search using wildcards
    • If (one apropriate match is found) display page
    • If (more relevant matches) display a "Did you tried to find ... page"
    • If (no matches) display error page

Note: keep in mind that fuzzy URLs are not recommended from a SEO perspective, because people can link your site using multiple URLs which will split your page rank instead of increase it.

Pop Catalin
A: 

An int comparison is faster than a text comparison; an equal to text comparison is faster than a like text comparison.

However, with correct indexes, LIKE 'Foo%' is quite faster than LIKE '%Foo%' and might be acceptable in few cases.

Salman A