views:

491

answers:

11

I want to suggest login to user if their first choice was already taken. Suppose, user want to register as "Superman". There is already some Supermans on site. Logins are suggested in form "Superman01", "Superman02" and so on. So, script must:

  • check for 'Superman' login in db
  • if already used, append '01' to login and check for it in DB
  • if already used, increment counter ('02'), append to login and check again
  • when non-claimed login is found, return it to user

What I don't like in this schema right now is that it takes multiple requests to MySQL database. Is there any way to get first unclaimed login in one go? Maybe with stored procedure or clever SQL query?

UPD: offered a bounty

+10  A: 

Why not just select where login like 'superman%' and iterate over the resultset in your code?

klausbyskov
until somebody on your 1.000.000 user site wants to registrate the username 'a' (and even if you have a minimum username length, it can become a quite big resultset)
Peter Smit
Just verify length (and other restrictions) using JavaScript on the client side, to avoid such a a case. And DEFINITELY clear the string from any SQL strings, to avoid SQL injection attacks.
Traveling Tech Guy
@Traveling Tech Guy: I think it would be safer to check it on the server - javascript validation can easily be bypassed.
Tom Haigh
@Peter Smit, +1. I modified my response slightly
Mike
@Tom - I would check on BOTH client and server side. 1. You can never be too safe 2. You never know when your server code will be re-used or turned into a web-service 3. You can never be too safe :)
Traveling Tech Guy
Why iterate? Check my answer.
Alix Axel
+3  A: 

You can, assuming the login field is properly indexed (which it should be), do:

select login from usertable where login = 'Superman';

If no rows are returned, you're done. Otherwise, you will have to check for other possibilities:

select login from usertable where login like 'Superman%' order by login;

Now, just find the variation with the highest numerical suffix and add one.

EDIT:
One query to the db to check only the actual name is fast, but one query to check all possibilities in a big database will be slow (not because of the like match - it's fast if you are indexed - but rather downloading all of those rows and processing them).

You would be better off doing 1 query to check the name, then only doing the query to check all names when the desired name doesn't work.

You can also cache the results of that query so that they can be reused without you having to go back to the DB the next time someone picks a supermanesque name. Just be sure to clear the results whenever you add a similar login name to the db.

Mike
+1  A: 

As per the comments on the question, a fixed range of 00 - 99 is desired. You could consider to do a SELECT MAX() on the last two parts of the name.

SELECT max(convert(substring(name, char_length(username)-1, 2), signed)) AS max
    FROM user 
    WHERE name LIKE 'superman%'

This is however not free of maintenance. What if there are 99 supermans?

This is also not free of potential collisions/clashes with usernames which already ends with digits like 01010101 and h4xx0r1337. What if there are already superman01 and superman02 and a new (and ignorant) user decides to register as superman88 because s/he is born at 1988; any next superman would get superman89 suggested, leaving a hole between superman02 and superman88.

It's hard to give a "best" answer on this particular question. The safest way would be something like:

if (find_user($username) != null) {
    for ($i = 0; $user != null; $i++) {
        $username = $username . $i;
        $user = find_user($username);
    }
}
// Now suggest $username.

There is of course a cost, but it's not shocking. Also think again, how often would this occur? Once a day maybe? Or once a year if your forum get on average only 1 new member per day?

BalusC
+2  A: 

If you're willing to store some state in the database...

When someone registers a username, stick it in the "available" table, which has two columns, "base_name" (string) and "next_available" (integer). If someone registers a username which ends in two digits, look for the base (the part preceding the final two digits), and either insert it into "available" or update "next_available".

When someone enters a username that isn't available, you can just look it up in the "available" table and give out the base and the next_available suffix. This can be done in one query.

Caveat: If someone registers "superman93" then you only get 6 more usernames, even if numbers 01 to 92 are available.

ZoFreX
+2  A: 

Here's my mildly daft solution: add one varchar column (called e.g. username_string_part) to your user table to store the string parts of the username, and a second int column (e.g. username_number_part) to store the numeric part. So superman1 is split into "superman" in the username_string_part column and "1" in username_number_part. Also create an index, possibly over both columns or just over the username_string_part if you're not expecting large numbers of duplicate username_string_part entries. So, in MySQL, your create table is something like this):

CREATE TABLE `users` (
  `id` int(11) NOT NULL auto_increment,
  `username` varchar(25) NOT NULL default '',
  `username_string_part` varchar(25) NOT NULL default '',
  `username_number_part` int(11) NOT NULL default 0,
  PRIMARY KEY  (`id`),
  KEY `ix_username_string_part` (`username_string_part`)
) TYPE=MyISAM AUTO_INCREMENT=1;

(Note that the username "superman" has a default username_number_part of zero - this is important.)

Once you have a few entries, your data would look something like this:

+----+-----------+----------------------+----------------------+
| id | username  | username_string_part | username_number_part |
+----+-----------+----------------------+----------------------+
|  1 | superman  | superman             |                    0 |
|  2 | superman1 | superman             |                    1 |
|  3 | superman3 | superman             |                    3 |
+----+-----------+----------------------+----------------------+

Then it's a case of selecting minimum value of username_number_part that doesn't have a username_number_part value of "itself plus one" in the database. So for the username "superman":

select min(username_number_part) + 1 as min_number_available from users
    where username_string_part = 'superman' and username_number_part not in
    (select username_number_part - 1 from users where
        username_string_part = 'superman');

The return value, min_number_available, is NULL if this is the first instance of that username - so they can have it - or an integer for the next free slot otherwise. You then build the recommended username as "superman" + min_number_available. You could do the concat in the query or not as you like. With the example data above you'll get the value "2" returned.

Downsides: it's going to add storage (column and index), and slow down inserts very slightly. It also doesn't naturally distinguish between "superman001" and "superman01". (Although it could if you treated leading zeroes as part of the username_string_part, so "superman001" would be split as "superman00" and "1".)

Upsides: it's a single query on indexed columns.

After all this, I'd be surprised if a site had so many username duplicates that doing a for loop with multiple database queries was really all that bad.

Karl B
+6  A: 

Ask for a hint phrase like this:

Please additionally provide a hint phrase which you would like to be part of your username in case the one you choose is already taken by somebody else.
For example, if your name is Joseph, then Joseph, josef or joe would be taken already. So you can provide a hint phrase which could be one of:

  1. your last name - eg. Smithson - which would suggest "joe.smithson"
  2. your city of residence - eg. Bay Area - which would suggest "joseph_bayarea"
  3. the purpose of the account - eg. developer - which would suggest "joseph-devel"
  4. a color - eg. blue - which would suggest "bluejoe"
  5. a number - which would be suffixed like "joe99"

Another way to get this hint information would be to combine other data entered in the sign-up form. I cannot immediately think of any other convenient and general-purpose scheme to guess what the user would like suggested as his username.

Especially since the service your site provides is not specified.

Another way of approaching this problem is to see the code behind those "intelligent" captchas that sites like Slashdot generate. Some witty devel ;-) has a bunch of words semantically linked to the topic at hand and uses those phrases for a captcha.

This smart/intelligent captcha thing is a bit like Google Sets.

Coding Horror also showed these smart catpchas occasionally.

Play around with those types of services or get hold of a good database of semantically linked terms. Then link those terms with the hint phrase you ask the user to supply.

Google does this easily because "All your searches are belong to Google"(TM).
You have a much easier task - you dont have to crawl the web and you dont have to provide search engine results or links. All you need is a semantically database.

You could get one is you looked hard enough online.
You could start with synonyms/antonyms etc.
IIRC, one such is wordnet, but i don't know the license. So do look it up.


Additional (optional, but dont implement partially):
I suggest that if you make such a good thing, make it opensource.
It will be pretty helpful to others and get you a great rep.
And make sure to also publish code against automated logins for the inevitable situation where some coder with no ethics and lots of spare time will use the semantically linked open word database to generate registration requests against your app and every other !
Bots keep getting smarter and smarter.
Email verification is one protection against this - but that's only if that email service cannot be subverted - which it can, if it is a new email service - which keep coming up all the time.

So it's a sizeable task if you're going to implement this idea and release it as opensource. Then you have to protect it also.

Or you can just keep it your own site.

namespaceform
I actually worked with wordnet, and I can use it. But now the task is to implement already chosen scheme - thhe one with numbers. I was asking for a clever ways to do exactly this.Though you answer is overall good and clear, and I certainly will follow your suggestions when developing some other app :)
Kuroki Kaze
+2  A: 

If you can change the database schema the solution is trivial.

Split the username into two columns: username and username_suffix (INTEGER).

If the username_suffix is 0, it isn't displayed. i.e. 'superman' and 'superman0' are equivalent.

You can then simply

SELECT MAX(username_suffix)+1 WHERE username = 'superman'

to get the next available suffix.

Alternatively if you can't change the database schema, try working probabilistically. Append a random 2 digit number; if that collides with an existing user, append a random 3 digit number instead; if that collides ...

If you don't mind mildly annoying a small fraction of potential users, just suggesting a username which is the user's proposed username with any trailing digits stripped off and extra random digits appended and not checking the database first will probably work well enough:

eg.

superman not available, try superman39...  (Try 2 extra digits first)
superman39 not available, try superman491... (now try 1 extra digit each time)
superman491 not available, try superman8972... (up to (say) 4 digits)
superman9872 not available, try superman2758

A potential user would have to be really unlucky to have to retry more than once or twice.


For some reason I didn't see @Karl's solution before I wrote this. If the extra db column is the best solution, then he should probably get the credit - although I think this is easier. However, the probabilistic approach makes much more sense to me.

MZB
I was going to go for select max(something)+1, but I read the OP's requirement of the "first unclaimed login" as meaning that he wanted e.g. superman02 returned if superman1 and superman3 were taken, whereas select max would return superman4. And of course, it might break if somebody did choose a maximum int value as the suffix... (And both would break if there were a couple of billion supermans...)
Karl B
+4  A: 

User regexp to find the required matches:

SELECT .. FROM users WHERE username REGEXP '^superman[0-9]{1,2}'

This will return all usernames in form of 'supermanX' or 'supermanXX' (one or two digits).

After you get your results, you can easily find the next-in-line number or the missing ones.

For more information, read the following:

http://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html

http://dev.mysql.com/doc/refman/5.0/en/regexp.html


Edit

Supposing the table is called 'users' and the field in question is called 'username', a possible code snipped is the following:

/**
 * Checks a given name exists at the users table
 * and returns possible alternatives
 * or an empty string if no alternatives can be found
 */
function CheckUsername($name);
    // sanitize
    $query = sprintf("SELECT username FROM users
            REGEXP '%s[0-9]{0,2}' ORDER BY username",
            mysql_real_escape_string($name));

    $result = mysql_query($query);

    // get all possible matches
    $rows = array();
    while (list($match) = mysql_fetch_row($result)) {
        $rows[] = $match;
    }

    if (count($rows) == 0) {
        // no rows found, return the original name
        return $name;

    } else {
        // found multiple rows

        if ($rows[0] != $name) {
            // first check if the original name exists
            return $name;

        } else {
            // else go through each number until we find a good username
            $count = 1;
            while ($counter < count($rows) {
                $test = sprintf("%s%02d", $name, $counter);
                if ($rows[$counter] != $test) return $test;
                $counter++;
            }
        }
    }

    // nothing found
    return '';
}

I hope it helps.

Anax
Regexes are your friend. +1
Elizabeth Buckwalter
+4  A: 

Here is my go at this:

SELECT `login`
  FROM `usertable`
WHERE `login` LIKE 'Superman%'
ORDER BY `login` DESC
LIMIT 1;

If the query doesn't returns results $username = 'Superman', otherwise:

$username = 'Superman' . (strrev(intval(strrev($result['username']))) + 1);

This should do the trick, however I must say I'm not a big fan of your username picking scheme.


The revised SQL query, in light of klausbyskov's first comment:

SELECT `login`
  FROM `usertable`
WHERE `login` RLIKE '^Superman[0-9]*$'
ORDER BY `login` DESC
LIMIT 1;
Alix Axel
So what if there is already a user called "SupermanWoman49" but no user called "Superman" and a user tries to create himself with "Superman". Wouldn't his username become "SupermanWoman410" ???
klausbyskov
@klausbyskov: It would become SupermanWoman50, but nice point. I've updated my answer.
Alix Axel
I'm not meaning to nitpick here, but imagine that the only user in the database is called "Superman42"...
klausbyskov
Yeah, I know... But it's hard to distinguish between a user generated username and a automatically picked username... I guess the most reliable solution would be to have a table with two fields: `username` and `username_count`.
Alix Axel
A: 

Most of these answers are correct but hard-code the requested username in the SQL statement.

SELECT MAX(SUBSTR(user,LENGTH('{$request}')+1))+1
FROM users
WHERE username LIKE '{$request}%'

Will return a suitable suffix (null if the username is not used already)

C.

symcbean
+1  A: 

The query below uses an auxiliary table with 10 records (digits '0' to '9') and a cross join to create a list of strings '00' to '99'. These values are concatenated with the user chosen login ('superman') and the result tested to be NOT IN your table of current users. The final result is a list of possible login names ('superman00' to 'superman99') which are not currently in use. You could show the user a few of these to choose. I tested in TSQL, should be easy to translate to MySQL (I think you have to replace 'superman'+T.i+U.i with CONCAT('superman',T.i,U.i) ):

--- prepare a digits table
 create table digits (i char(1));
 insert into digits (i) values ('0')
 insert into digits (i) values ('1')
 insert into digits (i) values ('2')
 insert into digits (i) values ('3')
 insert into digits (i) values ('4')
 insert into digits (i) values ('5')
 insert into digits (i) values ('6')
 insert into digits (i) values ('7')
 insert into digits (i) values ('8')
 insert into digits (i) values ('9')

--- This query returns all 'superman00' to 'superman99' records currently not used

SELECT 'superman'+T.i+U.i AS suggestedlogin
  FROM digits T cross join digits U
  WHERE 'superman'+T.i+U.i NOT IN (
    SELECT login FROM usertable
  )

(Cross join idea from http://www.tek-tips.com/viewthread.cfm?qid=755853)

Carlos Gutiérrez
Correct me if I'm wrong, but I believe storing a table of digits is quite a lot of overkill. I know storage is cheap now a days, but still
Elizabeth Buckwalter
@Elizabeth Buckwalter - I don't like auxiliary tables either, I never have had the need to use them in production, but it's an accepted solution. Check this article by Joe Celko: http://intelligent-enterprise.informationweek.com/showArticle.jhtml?articleID=202802386. Also, a SP could use a temp table.
Carlos Gutiérrez