views:

480

answers:

+2 Q:

Regex for keyboard mashing...

When signing up for new accounts, web apps often ask for the answer to a 'security question', i.e. Dog's name, etc.

I'd like to go through our database and look for instances where users just mashed the keyboard instead of providing a legitimate answer - this is a high indicator of an abusive/fraudulent account.

"Mother's maiden name?" lakdsjflkaj

Any suggestions as to how I should go about doing this?

Note: I'm not ONLY using regular expressions on these 'security question answers'

The 'answers' can be:

Selected from a db using a few basic sql regexes
Analyzed as many times as necessary using python regexes
Compared/pruned/scored as needed

This is a technical question, not a philosophical one ;-)

Thanks!

+33 A:

I would not do this - in my opinion these questions weaken the security, so as a user I always try to provide another semi-password as an answer - for you it would like mashed. Well, it is mashed, but that is exactly what I want to do.

Btw. I am not sure about the fact, that you can query the answers. Since they overcome your password protection they should be handled like passwords = stored as a hash!

tanascius 2009-07-21 14:55:51

Each paragraph in that answer deserves a separate upvote.

innaM 2009-07-21 14:59:39

+1, i use a seperate password for my secret question answers, also they definatly should be stored as hashes

Petey B 2009-07-21 15:01:36

The app's db already has this info stored.I'm looking for slick ways of finding the people that mashed the keyboard.A semi-password would not look 'mashed', since there is some thought put into it... mashed has lots of home-row letters like 'asdf' and so on.Hence the challenge.;-)

Marcel Chastain 2009-07-21 15:06:12

tanascius 2009-07-21 15:15:03

If you really want to force security on your users, letting them choose their own password isn't the right way ...

THC4k 2009-07-21 15:17:53

This isn't an issue over passwords guys, but about the answer to a 'security question'.

Marcel Chastain 2009-07-21 15:21:25

And the answer to the security question should also be encrypted.

Nick Lewis 2009-07-21 15:25:39

Gah. Threadjacked. Looking for **technical solutions** here guys. An answer with the word 'should' in it (i.e. "you should do this instead") is not answering the technical question.

Marcel Chastain 2009-07-21 15:53:31

I always mash the keyboard for these; "security questions" are insecure. I'd be quite irate if a site then told me "no, you *have* to give me your mother's maiden name", or if I was accused of abuse because I understand basic security. (I don't care if you're looking for technical solutions; when you ask for something that sounds like an inherently bad idea piled on top of an even worse idea, you'll just have to put up with people telling you so.)

Glenn Maynard 2009-07-21 18:13:36

I mash the keyboard too... but that doesn't help me write a regex. Some of the less popular comments have great advice that applies to the challenge.

Marcel Chastain 2009-07-21 19:47:40

+5 A:

There's no way to do this with a regex. Actually, I can't think of a reasonable way to do this at all -- where would you draw the line between suspicious and unsuspicious? I, for once, often answer the security questions with an obfuscated answer. After all, my mother's maiden name isn't the hardest thing to find out.

balpha 2009-07-21 14:56:27

obfuscated != mashed ... mashed is a fairly distinct distribution of letter frequency and spacing, esp w/lots of home row or adjacent keys. I'm not looking for 100% accuracy here, of course. I have close to a million of these 'security answers' stored, and I want to find the really suspicious ones.

Marcel Chastain 2009-07-21 15:09:46

You could look for patterns that don't make sense phonetically. Such as:

'q' not followed by a 'u'.

asdf

qwer

zxcv

asdlasd

Basically, try mashing on your own keyboard, see what you get, and plug that in your filter. Also plug in various grammatical rules. However, since it's names you're dealing with, you'll always get 'that guy' with the weird name who will cause a false positive.

samoz 2009-07-21 14:57:02

As for users of the Dvorak keyboard layout, or French users with an AZERTY keyboard, or Russian users typing in Cyrillic...

NickFitz 2009-07-21 15:03:30

Thanks for your input. I'll incorporate this into the final version.

Marcel Chastain 2009-07-21 15:12:16

+2 A:

If you can find a list of letter-pair probabilities in English, you could construct an approximate probability for the word not being a "real" English word, using the least possible pairs and pairs that are not in the list. Unfortunately, if you have names or other "non-words" then you can't force them to be English words.

Tim Sylvester 2009-07-21 14:57:53

Hmm, I like this. I'll check up on this one. Thanks for your feedback.

Marcel Chastain 2009-07-21 15:10:28

This is similar to the comment about 'analyzing n-gram distribution'. Great stuff, thanks again -- mC

Marcel Chastain 2009-07-21 15:47:52

Not all users are native English speakers. People could very easily choose to put their mother's maiden name in its native Chinese, or to put "ワンコ" as their first pet's name.

Glenn Maynard 2009-07-21 18:17:32

+1 A:

You could check for a capital letter at the start.... that will get you some false positives for sure.

A quick google gave me this, you could compare each against a name in that list.

Obviously only works for the security question you stated.

Have you also seen this:

Anatomy of the twitter attack

I'm going to think hard next time i implement a security question.

Question Mark 2009-07-21 14:59:17

Wow, that's a great article. Thanks for that! Yeah, if this was my app, I'd rethink using this feature. On the other hand, for the purposes of detecting fraudulent accounts, it might help me, given that the rest of the info (name, CC#, address, IP country, etc) is all legit. Just making lemonade over here ;-)

Marcel Chastain 2009-07-21 15:18:57

+1 A:

You're probably better off analyzing n-gram distribution, similar to language detection.

This code is an example of language detection using trigrams. My guess is the keyboard smashing trigrams are pretty unique and don't appear in normal language.

itsadok 2009-07-21 14:59:57

Thanks for your input. This is a step in the right direction for me. More ideas like this, please..! -- mC

Marcel Chastain 2009-07-21 15:32:06

Wow, this is fantastic..! -- mC

Marcel Chastain 2009-07-21 15:40:41

+1 A:

If your question is ever something related to a real, human name, this is impossible. Consider Asian names typed with roman characters; they may very well trip whatever filter you come up with, but are still perfectly legitimate.

Shadow 2009-07-21 15:02:24

Huh? I don't understand how Gupta, Singh, Zhang, Nguyen, Tran, Watanabe etc are going to trip up any reasonable filter, especially if the n-gram statistics are based on surname lists that relate to the customer base -- if you have enough customers, use your customers surnames to get the statistics! In any case, you have to be prepared for false positives, and you don't send out the armed police on the basis of 1 indicator and no human review.

John Machin 2009-07-25 03:20:35

+1 A:

First, check if it only consists of the letters qwrtpsdfghjklzxcvbnm, then check if it consist only of the letters eyuioa, then check if it contains special characters like -+=()*&^%$#@!,.<>?/. When the question contains 'Birthday' or 'Date' or something that requires answers against the questions 'consists of the letters/special characters/numbers...', skip the questions affected. I hope you understand me, I'm dutch and I don't speak English very well.

Time Machine 2009-07-21 15:07:30

Thanks for your input (and especially for answering the actual question..!) I'll be sure to incorporate this in the final solution..! -- mC

Marcel Chastain 2009-07-21 15:46:18

+8 A:

The whole approach of security questions is quite flawed.

I have always found people put security answers weaker than the passwords they use.
Security questions are just one more link in a security chain -- the weaker link!

IMO, a better way to go would be to allow the user to request a new-password sent to their registered e-mail id. This has two advantages.

the brute-force attempt has to locate and break the e-mail service first (and, you will never help them there -- keep the registration e-mail id very protected)
the user of your service will always get an indication when someone tries a brute-force (they get a mail saying they tried to regenerate their password)

If you MUST have secret questions, let them trigger a re-generated (never send the user's password, regenerate a temporary, preferably one-time forced) password dispatch to the e-mail id they registered with -- and, do not show that at all.

Another trick is to make the secret question ITSELF their registered e-mail id.
If they put it right, you send a re-generated temporary password to that e-mail id.

nik 2009-07-21 15:16:30

Well yeah, I haven't discussed what exactly happens after they press submit. Your ideas are sound. In our app, they have to answer a security question in order for a new password to be sent to their registered email id, exactly as you said. This challenge is all about *detecting mashing patterns with regex + code*, but I think we started a debate about security questions as a whole ;-) Thanks again for your input.

Marcel Chastain 2009-07-21 15:30:43

Well, guess that means no html in comments huh.

Marcel Chastain 2009-07-21 15:31:25

That's worse yet. You're just making security-conscious users, who never input real answers to security questions, unable to recover their password.

Glenn Maynard 2009-07-21 18:15:51

I always just put the password in when I *have* to put something in.

Brad Gilbert 2009-07-22 03:32:54

+1 A:

Maybe you could check for an abundance of consonants. So for example, in your example lakdsjflkaj there are 2 vowels ( a ) and 9 consonants. Usually the probability of hitting a vowel when randomly pressing keys is much lower than the one of hitting a consonant.

Geo 2009-07-21 15:22:22

Interesting approach. I think this would work well with some of the other tests I have in store. Thanks! -- mC

Marcel Chastain 2009-07-21 15:32:54

Instead of regular expressions, why not just compare with a list of known good values? For example, compare Mother's maiden name with census data, or pet name with any of the pet name lists you can find online. For a much simpler version of this, just do a Google search for whatever is entered. Legitimate names should have plenty of results, while keyboard mashing should result in very few if any.

As with any other method, you will still need to handle false positives.

Iceman 2009-07-21 15:52:45

That's an interesting approach, thanks for the input. We have several different security questions, and honestly, I'm just looking for a few hundred highly suspicious accounts that all have mashed 'security question' answers. Thanks again -- mC

Marcel Chastain 2009-07-21 15:55:43

This is all completely ridiculous. If people want to mash the keyboard, let them - you can't be a cop all the time.

Mailslut 2010-03-03 08:18:30

ansaurus

tags:

views:

answers:

Regex for keyboard mashing...

The whole approach of security questions is quite flawed.

related questions