tags:

views:

74

answers:

6

Hi, I need to ensure that a input string follows these rules:

  • It should contain upper case characters only.
  • NO character should be repeated in the string. eg. ABCA is not valid because 'A' is being repeated.

For the upper case thing, [A-Z] should be fine. But i am lost at how to ensure no repeating characters.

Can someone suggest some method using regular expressions ?

+2  A: 

This cannot be done via regular expressions, because they are context-free. You need at least context-sensitive grammar language, so only way how to achieve this is by writing the function by hand.

See formal grammar for background theory.

Yossarian
At least not with a single pattern.
ApoY2k
It can, with an infinitely long regular expression :)
BoltClock
yes since the length cannot be >26, the resulting regex that way would be awfully long i guess assuming I will be backreferencing all previous groups :P
mishal153
@Mark, then they are not regular expressions.
Yossarian
@BoltClock with finite alphabet (as upper case characters are)it's finite. For example: we have 26 uppercase characters, and each one can appear only once. Then we have 26! possible 26-character-long words, a bit more 25-char-long words etc. At the end we can just construct long (but finite) regexp by union of all words.So it can be done with regexp, but not practical in this case.
phadej
@phadej: yeah, I wrote that in an exaggerative way ;)
BoltClock
+2  A: 

Why not check for a character which is repeated or not in uppercase instead ? With something like ([A-Z])?.*?([^A-Z]|\1)

Colin Hebert
I guess that would also serve the purpose. How do i do that using regex ?
mishal153
Check with this regex. If it doesn't match, then your condition is true.
Colin Hebert
ok. I tried it and its working fine. But i have used a non-regex approach eventually.
mishal153
You're right, it will be faster.
Colin Hebert
+4  A: 

You can do this with .NET regular expressions although I would advise against it:

string s = "ABCD";
bool result = Regex.IsMatch(s, @"^(?:([A-Z])(?!.*\1))*$");

Instead I'd advise checking that the length of the string is the same as the number of distinct characters, and checking the A-Z requirement separately:

bool result = s.Cast<char>().Distinct().Count() == s.Length;

Alteranatively, if performance is a critical issue, iterate over the characters one by one and keep a record of which you have seen.

Mark Byers
Marking this a right answer since this is what i have used eventually :)
mishal153
+1  A: 

Use negative lookahead and backreference.

  string pattern = @"^(?!.*(.).*\1)[A-Z]+$";
  string s1 = "ABCDEF";
  string s2 = "ABCDAEF";
  string s3 = "ABCDEBF";
  Console.WriteLine(Regex.IsMatch(s1, pattern));//True
  Console.WriteLine(Regex.IsMatch(s2, pattern));//False
  Console.WriteLine(Regex.IsMatch(s3, pattern));//False

\1 matches the first captured group. Thus the negative lookahead fails if any character is repeated.

Amarghosh
A: 

This isn't regex, and would be slow, but You could create an array of the contents of the string, and then iterate through the array comparing n to n++

=Waldo

gWaldo
A: 

It can be done using what is call backreference.

I am a Java program so I will show you how it is done in Java (for C#, see here).

final Pattern aPattern = Pattern.compile("([A-Z]).*\\1");
final Matcher aMatcher1 = aPattern.matcher("ABCDA");
System.out.println(aMatcher1.find());
final Matcher aMatcher2 = aPattern.matcher("ABCDA");
System.out.println(aMatcher2.find());

The regular express is ([A-Z]).*\\1 which translate to anything between 'A' to 'Z' as group 1 ('([A-Z])') anything else (.*) and group 1.

Use $1 for C#.

Hope this helps.

NawaMan