views:

42

answers:

2

In my web application, I create some framework that use to bind model data to control on page. Each model property has some rule like string length, not null and regular expression. Before submit page, framework validate any binded control with defined rules.

So, I want to detect what character that is allowed in each regular expression rule like the following example.

"^[0-9]+$" allow only digit characters like 1, 2, 3.
"^[a-zA-Z_][a-zA-Z_\-0-9]+$" allow only a-z, - and _ characters

However, this function should not care about grouping, positioning of allowed character. It just tells about possible characters only.

Do you have any idea for creating this function?

PS. I know it easy to create specified function like numeric only for allowing only digit characters. But I need share/reuse same piece of code both data tier(contains all model validator) and UI tier without modify anything.

Thanks

A: 

I must admit that I'm struggling to parse your question.

If you are looking for a regular expression that will match only if a string consists entirely of a certain collection of characters, regardless of their order, then your examples of character classes were quite close already.

For instance, ^[A-Za-z0-9]+$ will only allow strings that consist of letters A through Z (upper and lower case) and numbers, in any order, and of any length.

Tim Pietzcker
I'm so sorry. I do not explain what is "two words near" regular expression. You can see for more detail at http://www.regular-expressions.info/near.html
Soul_Master
I know what the "two words near" regex means; I just don't understand what you want to do with it. You write "By the way", so it has nothing to do with your question, or has it? EDIT: I see you have removed that bit, so I have removed mine, too :)
Tim Pietzcker
I just give some example about complex regular expression that should not be processed.
Soul_Master
A: 

You can't solve this for the general case. Regexps don't generally ‘fail’ at a particular character, they just get to a point where they can't match any more, and have to backtrack to try another method of matching.

One could make a regex implementation that remembered which was the farthest it managed to match before backtracking, but most implementations don't do that, including JavaScript's.

A possible way forward would be to match first against ^pattern$, and if that failed match against ^pattern without the end-anchor. This would be more likely to give you some sort of match of the left hand part of the string, so you could count how many characters were in the match, and say the following character was ‘invalid’. For more complicated regexps this would be misleading, but it would certainly work for the simple cases like [a-zA-Z0-9_]+.

bobince
I want to detect only special case as you answer. It must like ^[some regular expression]$ and does not contain any complex regular expression like backtrack.
Soul_Master
Yeah, trying matching without the `$` on failure then. If you get a match at all, `match[0].length` will give you the index of the first character that didn't match.
bobince
I don't want to run regular expression every time that use press key. It's quite expensive computation cost for me. I just want some that process some regular expression to show what character is possible and check key is pressed. If it is not validate, JavaScript will cancel current event.
Soul_Master
It shouldn't be expensive. Checking a short string against a pre-constructed, non-backtracky regex should be very fast. Note that you can only really do this by validating the whole string. Even if you did manage to find a way to guess valid net characters, you can't reliably read the next character being inserted into an input.
bobince