views:

29

answers:

3

I want a rails model to exclude certain patterns: runs of two or more spaces.

User.name = "Harry Junior Potter" is valid, but User.name = "Harry Junior Potter" is not (two spaces between Harry and Junior). This to avoid identity theft, where those two names are displayed the same (HTML compresses runs of whitespace).

In other words: Allowed is: [0-9A-z_-] and '\s only in series of one'.

My regular-expression is too poor to craft such a regexp, this is what I have (with a negative lookahead, but it does not match correctly.

 /([0-9A-z_\-])(\s(?!\s))?/

Note: a before_validation hook already strip()s all the elements, so spaces at begin or end of the string are not a problem.

+1  A: 

Isn't it a lot easier just to replace "__" with "_" ? (Using underscores to show the spaces) My Ruby isn't that fluent but it should be something like

User.name.replace!("  ", " ") while User.name.contains("  ")

Then you could use this regex to check for the rest

([\w\-]+\s?)+
Sjuul Janssen
A good route, indeed: compress all spaces into one. Then the validation handler can look for duplicates: "Harry__Potter" would then become "Harry_Potter", wich may already exist. Small detail: I cannot simply look for a literal space, since in UTF8 there are many forms of whitespace possible (including tabs, linebreaks and so on)
berkes
You can use `gsub` instead: `User.name.gsub!(/\s+/, ' ')`. That will convert other whitespace to spaces in addition to collapsing it. By the way, TAB and linefeed are ASCII characters, so `\s` will match them. There *are* a lot of non-ASCII whitespace characters, but they would have no business being in a name; I would just disallow them.
Alan Moore
Nice to see that you're getting some more good ideas here.
Sjuul Janssen
A: 

I would suggest an alternate route.. a very weird one in fact. If you are on a time crunch and would not bother having an extra column in your existing table then i would suggest creating a slug of the user-name. I know its an overkill for your problem. But i prefer doing my checks that way. Instead of checking the new user-name against the already stored one(and breaking my head over all those messy regexs), i just check the new slug against the stored slug(which by the way, handle all those messy regexs for you). You can check out to_slug: http://github.com/ludo/to_slug

Slugs are used primarily to filter out dangerous characters from URLs. Why not utilize the same for checking user-names? It handles unicode characters too.

This is not a direct answer to your problem. But i faced the same situation as you did and since i was on a time crunch i decided to use slugs.

A simple check in my console yields:

>> "Harry Junior Potter".to_slug
=> "harry-junior-potter"
>> "Harry  Junior  Potter".to_slug
=> "harry-junior-potter"
>> "Harry         Junior           Potter".to_slug
=> "harry-junior-potter"
>> "Harry(junior(potter))".to_slug
=> "harry-junior-potter"
>> "Harry_Junior_Potter".to_slug
=> "harry_junior_potter"

So, now if and only if the slug validates the user is allowed to store his name.

Shripad K
I like this idea. It allows me to store whatever the user inserted, but use the validated value I wish to use. This way, you can change validation in future and have it backwards compatible.
berkes
The only disadvantage of having to do this is to have an extra column in the database. It is just a matter of trade-off (time vs storage). If you have enough time on hand, use regex checking. If not use slugs. I personally stay away from regexs as much as possible. Still, if you plan to not use the above solution then i suggest checking out http://rubular.com. It makes it easier to test your regexs.
Shripad K
A: 

First off, [A-z] is an error. It's equivalent to

[A-Z\[\\\]^_`a-z]

...and I'm pretty sure that's not what you had in mind. You have to spell out the two ranges separately: [A-Za-z]. But in this case you're also matching digits and underscore, so you can use \w like @Sjuul did: [\w-]+. That makes your regex

/^[\w-]+(?: [\w-])*$/

Of course, that will match silly things like -- - ---, and it won't match a lot of real names. I'm just answering your question about allowing only a single space between names.

Alan Moore
I son't really understand what you mean with "and won't match a lot of real names". I deliberately left out accents and none ascii characters for now, to focus on the double-spaces issue. (first thin gmy app wil have to do, after this, is accept my own name, Bèr :)
berkes
That's exactly what I meant. I just wanted to head off any comments (from anyone) about problems not related to whitespace.
Alan Moore