ansaurus

Question

How to reject names (people and companies) using whitelists with C# regex's?

Answer 1

+2 A:

Company names might have almost any kind of symbol in them, so I don't know how well this is going to work for you. I'd concentrate on shielding yourself directly from various attacks, not hoping that your strings are "naturally" safe.

(Certainly they can have ampersands, colons, semicolons, exclamation points, hyphens, percent signs, and all kinds of other things that could be "unsafe" in a host of contexts.)

mquander 2009-05-14 21:23:17

Answer 2

+3 A:

Do not try to sanitize names, especially with regex!

Just make sure that you are properly escaping the values and saving them safely in your DB, and them escaping them back when presenting in HTML

duckyflip 2009-05-14 21:34:14

Answer 3

A:

I think writing your own regexp is not a good idea: it would be very hard. Try leveraging existing functions of your web framework, there is lots of resources on the net. If you say C#, I assume you are using ASP.NET, try the following article: How To: Protect From Injection Attacks in ASP.NET

bbmud 2009-05-14 21:37:01

Answer 4

A:

This SO thread seems similar -- it might help.

JP Alioto 2009-05-14 21:40:45

Answer 5

+3 A:

This SO thread has a lot of good discussion on protecting yourself from injection attacks.

In short:

Filter your input as best as you can
Escape your strings using framework based methods
Parameterize your sql statements

In your case, you can limit the name field to a small character set. The company field will be more difficult, and you need to consider and balance your users need for freedom of entry with your need for site security. As others have said, trying to write your own custom sanitation methods is tricky and risky. Keep it simple and protect yourself through your architecture - don't simply rely on strings being "safe", even after sanitization.

EDIT:

To clarify - if you're trying to develop a whitelist, it's not something that the community can hand out, since it's entirely dependent on the data you want. But let's look at a example of a regex whitelist, perhaps for names. Say I've whitelisted A-Z and a-z and space.

Regex reWhiteList = new Regex("^[A-Za-z ]+$")

That checks to see if the entire string is composed of those characters. Note that a string with a number, a period, a quote, or anything else would NOT match this regex and thus would fail the whitelist.

if (reWhiteList.IsMatch(strInput))
   // it's ok, proceed to step 2
else
   // it's not ok, inform user they've entered invalid characters and try again

Hopefully this helps some more! With names and company names you'll have a tough-to-impossible time developing a rigorous pattern to check against, but you can do a simple allowable character list, as I showed here.

patjbs 2009-05-14 21:47:39

Step 1 is what I am trying to figure out. The referenced article mentions white lists.

jm 2009-05-15 00:46:27

>> trying to develop a whitelist, it's not something that the community can hand out, I think it is something the community can help with. I'm trying to whitelist people names. Most people have them :) It's not some outlandish, uncommon thing. I agree with your approach. I just need to figure out the "reWhiteList"

jm 2009-05-16 18:56:03

Answer 6

+1 A:

Why filter or regex the data at all, or even escape it, you should be using bind variables to access the database.

This way, the customer could enter something like: anything' OR 'x'='x

And your application doesn't care because your SQL code doesn't parse the variable because it's not set when you prepare the statement. I.e.

'SELECT count(username) FROM usertable WHERE username = ? and password = ?'

then you execute that code with those variables set.

This works in PHP, PERL, J2EE applications, and so on.

krypt0 2009-05-15 03:25:35

Can't they still enter javascript and do an XSS attack?

jm 2009-05-15 04:49:29

You also need to html encode the data when you send it to the browser.

Dave Hinton 2009-05-15 16:22:29

Answer 7

A:

This is my current regex WHITELIST for a company name. Any input outside of these characters is rejected:

"^[0-9\p{L} '-.,\/\&]{0,50}$"

The \p{L} matches any unicode "letter". So, the accents and asian characters are whitelisted.

The \& is a bit problematic because it potentially allows javascript special characters.

the \' is problematic if not using parameterized queries, because of SQL injection.

the - could allow "--", also a potential for SQL injection if not using parameterized queries.

Also, the \p{L} won't work client-side, so you can't use it in the ASP.NET regular expression validator without disabling clientside validation: EnableClientScript="False"

jm 2009-05-16 16:46:48

ansaurus

tags:

views:

answers:

How to reject names (people and companies) using whitelists with C# regex's?

related questions