views:

1568

answers:

7

I have a MySQL table where all the data in one column was entered in UPPERCASE, but I need to convert in to Title Case, with recognition of "small words" akin to the Daring Fireball Title Case script.

I found this excellent solution for transforming strings to lowercase, but the Title Case function seems to have been left out of my version of MySQL. Is there an elegant way to do this?

A: 

The definitive case to look for such a function is in the documentation.

Unfortunately, you've got LOWER() and UPPER() functions, but no Title Case. Your best bet would be to declare your own function that splits on spaces, ignores your small words, and does an UPPER on the first character of each remaining word.

rooskie
A: 

The answer is going to depend on what type of data is in the column, i.e. is it people names, place names, book titles, etc. If you are looking for a SQL solution, I would do it in multiple updates, for example:

  1. Initial string is: "TO KILL A MOCKINGBIRD"
  2. Convert to initial caps: "To Kill A Mockingbird"
  3. Convert small words to lower case if they do not start the string: "To Kill a Mockingbird"
Jamie Ide
You forgot steps 4. and 5.:4. ???5. Profit.Jokes aside; it's a sensible algorithm, but what built-in MySQL functions can accomplish step 2?
rooskie
None, the OP is going to have to write some SQL. The points I was trying to make were 1) "Title case" is vague without knowing the meaning of the data; and 2) This is best done in multiple steps. I'm not about to spend time writing SQL without knowing the answer to point 1. hobodave's answer looks right to me but won't give the desired results if the column is full of names such as "FLANNERY O'CONNOR".
Jamie Ide
@jamie: You'll just have to make special exceptions for the Irish. :-)
hobodave
A: 

you could do this with concat(), substring(), and length() but I can only see it working for one word. Is there a specific reason why you can't do this in your application's code, instead of mysql?

Mark
+4  A: 

Edit

Eureka! Literally my first SQL function. No warranty offered. Back up your data before using. :)

First, define the following function:

DROP FUNCTION IF EXISTS lowerword;
SET GLOBAL  log_bin_trust_function_creators=TRUE; 
DELIMITER |
CREATE FUNCTION lowerword( str VARCHAR(128), word VARCHAR(5) )
RETURNS VARCHAR(128)
BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE loc INT;

  SET loc = LOCATE(CONCAT(word,' '), str, 2);
  IF loc > 1 THEN
    WHILE i <= LENGTH (str) AND loc <> 0 DO
      SET str = INSERT(str,loc,LENGTH(word),LCASE(word));
      SET i = loc+LENGTH(word);
      SET loc = LOCATE(CONCAT(word,' '), str, i);
    END WHILE;
  END IF;
  RETURN str;
END;
|
DELIMITER ;

This will lower all occurrences of word in str.

Then define this modified proper function:

DROP FUNCTION IF EXISTS tcase; 
SET GLOBAL  log_bin_trust_function_creators=TRUE; 
DELIMITER | 
CREATE FUNCTION tcase( str VARCHAR(128) ) 
RETURNS VARCHAR(128) 
BEGIN 
  DECLARE c CHAR(1); 
  DECLARE s VARCHAR(128); 
  DECLARE i INT DEFAULT 1; 
  DECLARE bool INT DEFAULT 1; 
  DECLARE punct CHAR(17) DEFAULT ' ()[]{},.-_!@;:?/'; 
  SET s = LCASE( str ); 
  WHILE i <= LENGTH( str ) DO
    BEGIN 
      SET c = SUBSTRING( s, i, 1 ); 
      IF LOCATE( c, punct ) > 0 THEN 
        SET bool = 1; 
      ELSEIF bool=1 THEN  
        BEGIN 
          IF c >= 'a' AND c <= 'z' THEN  
            BEGIN 
              SET s = CONCAT(LEFT(s,i-1),UCASE(c),SUBSTRING(s,i+1)); 
              SET bool = 0; 
            END; 
          ELSEIF c >= '0' AND c <= '9' THEN 
            SET bool = 0; 
          END IF; 
        END; 
      END IF; 
      SET i = i+1; 
    END; 
  END WHILE;

  SET s = lowerword(s, 'A');
  SET s = lowerword(s, 'An');
  SET s = lowerword(s, 'And');
  SET s = lowerword(s, 'As');
  SET s = lowerword(s, 'At');
  SET s = lowerword(s, 'But');
  SET s = lowerword(s, 'By');
  SET s = lowerword(s, 'For');
  SET s = lowerword(s, 'If');
  SET s = lowerword(s, 'In');
  SET s = lowerword(s, 'Of');
  SET s = lowerword(s, 'On');
  SET s = lowerword(s, 'Or');
  SET s = lowerword(s, 'The');
  SET s = lowerword(s, 'To');
  SET s = lowerword(s, 'Via');

  RETURN s; 
END; 
| 
DELIMITER ;

Usage

Verify it works as expected:

SELECT tcase(title) FROM table;

Use it:

UPDATE table SET title = tcase(title);

Source: http://www.artfulsoftware.com/infotree/queries.php?&amp;bw=1070#122

hobodave
Thank you for this. Saved my day.
nikc
Awesome work - I needed something that would handle user defined acronyms so I've generalised what you've got here.
CurtainDog
When using this I got this error: #1305 - FUNCTION table_name.LENGTH does not exist. Am I missing something?
Kieran Andrews
A: 

Woo! I'm not handy with SQL at all; Here's the method that worked for me:

  1. Export the table as a text file in .sql format.
  2. Open the file in Textmate (which I already had handy).
  3. Select the rows with UPPERCASE data.
  4. Choose "Convert" from the "Text" menu, and select "to Titlecase".
  5. Find and replace each instance of:

    INSERT INTO `Table` (`Col1`, `Col2`, `Etc`, ...) VALUES
    

    with the correct lowercase values.

  6. Import the table back into the database.
  7. Use UPDATE table SET colname=LOWER(colname); to reset lowercase values for the columns that should be lowercase.

The reason I didn't try using Textmate before was that I couldn't figure out how to convert a single column to Title Case without ruining the other data, but this method seems to work. Thanks for your guidance and support!

John Stephens
A: 

Could anyone help and tell me why crummy Navicat keeps telling me that there is no RETURN when I try to input this code. Of course, Navicat converts it to the below.

CREATE DEFINER = CURRENT_USER FUNCTION NewProc(str varchar(128),word varchar(8)) RETURNS varchar(128) SET GLOBAL log_bin_trust_function_creators=TRUE; BEGIN DECLARE i INT DEFAULT 1; DECLARE loc INT;

SET loc = LOCATE(CONCAT(word,' '), str, 2); IF loc > 1 THEN WHILE i <= LENGTH (str) AND loc <> 0 DO SET str = INSERT(str,loc,LENGTH(word),LCASE(word)); SET i = loc+LENGTH(word); SET loc = LOCATE(CONCAT(word,' '), str, i); END WHILE; END IF; RETURN 'str'; END;;

Virgil Varvel
A: 

If you need to throw custom acronyms and other custom capitalisation patterns into the mix I've generalised hobodave's answer:

DELIMITER |
CREATE FUNCTION replaceword( str VARCHAR(128), word VARCHAR(128) )
RETURNS VARCHAR(128)
BEGIN
  DECLARE loc INT;
  DECLARE punct CHAR(27) DEFAULT ' ()[]{},.-_!@;:?/''"#$%^&*<>'; 
  DECLARE lowerWord VARCHAR(128);
  DECLARE lowerStr VARCHAR(128);

  IF LENGTH(word) = 0 THEN
    RETURN str;
  END IF;
  SET lowerWord = LOWER(word);
  SET lowerStr = LOWER(str);
  SET loc = LOCATE(lowerWord, lowerStr, 1);
  WHILE loc > 0 DO
    IF loc = 1 OR LOCATE(SUBSTRING(str, loc-1, 1), punct) > 0 THEN
      IF loc+LENGTH(word) > LENGTH(str) OR LOCATE(SUBSTRING(str, loc+LENGTH(word), 1), punct) > 0 THEN
        SET str = INSERT(str,loc,LENGTH(word),word);
      END IF;
    END IF;
    SET loc = LOCATE(lowerWord, lowerStr, loc+LENGTH(word));
  END WHILE;
  RETURN str;
END;
|
DELIMITER ;

DELIMITER | 
CREATE FUNCTION tcase( str VARCHAR(128) ) 
RETURNS VARCHAR(128) 
BEGIN 
  DECLARE c CHAR(1); 
  DECLARE s VARCHAR(128); 
  DECLARE i INT DEFAULT 1; 
  DECLARE bool INT DEFAULT 1; 
  DECLARE punct CHAR(27) DEFAULT ' ()[]{},.-_!@;:?/''"#$%^&*<>'; 

  SET s = LCASE( str ); 
  WHILE i <= LENGTH( str ) DO
    BEGIN 
      SET c = SUBSTRING( s, i, 1 ); 
      IF LOCATE( c, punct ) > 0 THEN 
        SET bool = 1; 
      ELSEIF bool=1 THEN  
        BEGIN 
          IF c >= 'a' AND c <= 'z' THEN  
            BEGIN 
              SET s = CONCAT(LEFT(s,i-1),UCASE(c),SUBSTRING(s,i+1)); 
              SET bool = 0; 
            END; 
          ELSEIF c >= '0' AND c <= '9' THEN 
            SET bool = 0; 
          END IF; 
        END; 
      END IF; 
      SET i = i+1; 
    END; 
  END WHILE;

  SET s = replaceword(s, 'a');
  SET s = replaceword(s, 'an');
  SET s = replaceword(s, 'and');
  SET s = replaceword(s, 'as');
  SET s = replaceword(s, 'at');
  SET s = replaceword(s, 'but');
  SET s = replaceword(s, 'by');
  SET s = replaceword(s, 'for');
  SET s = replaceword(s, 'if');
  SET s = replaceword(s, 'in');
  SET s = replaceword(s, 'n');
  SET s = replaceword(s, 'of');
  SET s = replaceword(s, 'on');
  SET s = replaceword(s, 'or');
  SET s = replaceword(s, 'the');
  SET s = replaceword(s, 'to');
  SET s = replaceword(s, 'via');

  SET s = replaceword(s, 'RSS');
  SET s = replaceword(s, 'URL');
  SET s = replaceword(s, 'PHP');
  SET s = replaceword(s, 'SQL');
  SET s = replaceword(s, 'OPML');
  SET s = replaceword(s, 'DHTML');
  SET s = replaceword(s, 'CSV');
  SET s = replaceword(s, 'iCal');
  SET s = replaceword(s, 'XML');
  SET s = replaceword(s, 'PDF');

  SET c = SUBSTRING( s, 1, 1 ); 
  IF c >= 'a' AND c <= 'z' THEN  
      SET s = CONCAT(UCASE(c),SUBSTRING(s,2)); 
  END IF; 

  RETURN s; 
END; 
| 
DELIMITER ;

Essentially it consists of a case-insensitive word replace function and a function to capitalise the first letter of every word and perform some transforms for specific words.

Hope its helpful to someone.

CurtainDog