ansaurus

Question

Answer 1

+1 A:

You may consider keeping one entry per tag instead of all tags as a string, so that you could do a select distinct among other things.

Wernight 2010-10-22 12:41:45

Yes this is a good idea, unfortunately I did not design the database or script and do not have the time to go through it to making the necessary changes. There are also thousands of entries in there already.

Alex Scott 2010-10-22 12:44:00

The easiest way I see is using a script to make the changes. Something like `unique(split(tags))`. That's also how I'd convert your database. You can also try `select tags from splitstring(',', [list])`

Wernight 2010-10-22 12:49:27

Answer 2

+1 A:

The model you describe (all tags into a single cell, separated by spaces) is not normalized so you can't expect to find a simple, performant and reliable way to do stuff with it from the database server (beyond reading the column). The way it's now, PHP is your only chance to do the cleanup you are planning to do, and you'll have to retrieve every row.

Is it too late to make a little change in the database design? If you store each tag into a separate row in a tag table you'd be able to do lots of stuff from plain SQL.

Álvaro G. Vicario 2010-10-22 12:49:01

Answer 3

A:

If it's a real option,

Change your database design. I don't know about your time constraints so it may really not be an option, but consider which of these two paths you'd rather go down:

A couple of hours now redesigning the database, then writing, debugging and verifying a script that'll take all the values from the existing layout and put them in the new one.
Hours and hours later coming up with obscure queries for otherwise simple tasks that would take ten minutes to write a query for if the database was designed the way a relational database should be.

If it's really not an option though...

Let Sentence = the string of words.
Split Sentence up on every space and build an array out of it*. Store this as Words.
Let UniqueWords = an array of words with no duplicates.
For each Word in Words:
     If the Word is not in UniqueWords, put it in.

*a la PHP explode

You could also process it as a raw string (stopping to check at spaces or EOL), which may be faster, but if speed is important, your current database design should be far more concerning than this loop.

EDIT: I didn't see that you wanted it in a SQL query. I'm not sure it'd be possible using a query; perhaps a stored procedure will do. I don't know how to use those though.

Axidos 2010-10-22 13:02:07

Yes, to be honest I was very suprised to see the design of the database. I think I will go the php route, although the database wouldn't take long to redesign, I think it could take weeks to pick through someone elses poorly commented code to change the search and tagging system.

Alex Scott 2010-10-22 13:28:52

Answer 4

A:

+1 redesign, but if redesign is not an option now...

How many distinct tags are there? You might be able to do this using CASE and substring functions.

http://dev.mysql.com/doc/refman/5.0/en/case-statement.html

Tim 2010-10-22 14:00:37

Answer 5

A:

IMO, you're best to handle this with PHP

$uniqueTags = array_unique(explode(' ', $tagsFromDbColumn));

PMV 2010-10-22 17:17:40

ansaurus

tags:

views:

answers:

Remove duplicate text from field

If it's a real option,

If it's really not an option though...

related questions