The task I was trying to accomplish was that given an input pattern, e.g. 1 2 3 3 2 4 2 1, to go through a dictionary and find words that fit the given pattern. In my code, I tried taking the given string and converting it to a regular expression like so:
(?<1>.)(?<2>.)(?<3>.)(\k<3>)(\k<2>)(?<4>.)(\k<2>)(\k<1>)
(Before anyone starts bashing use of the dot here, since my input is a dictionary file with only real words, I left the dots to have a cleaner looking expression rather than specifying ranges of characters.)
This expression manages to find the word correctly, but there's a flaw in it. The problem becomes very apparent with a pattern such as 1 2 3 4 5 6. My algorithm generates the following regular expression:
(?<1>.)(?<2>.)(?<3>.)(?<4>.)(?<5>.)(?<6>.)
This is wrong because it will match any 6 character string without taking into account that each group should NOT match any characters that have already been matched by previous groups. In other words, it doesn't take into account that each letter is distinct; no repeats.
So I tried looking all over the internet for syntax to exclude a named group within a character class, i.e.
[^\1] (doesn't work), [^(\k<1>)] (doesn't work), [^${1}] (doesn't work)...etc.
In the .NET documentation, it shows that \p{name} is valid syntax in a character class, but I tried [^\p{1}] and didn't work either.
So, question remains...is it possible to exclude a named group from further matching? Or, how else would I solve this?
UPDATE
Posting my final solution based on the responses I got here. This method takes a string specifying the pattern one is looking for and converts it into a regular expression that I then apply to a dictionary and find all words that fit the pattern.
string pattern = "12332421";
private void CreateRegEx()
{
string regex = "^";
for( int i = 0; i < pattern.Length; i++ )
{
char c = pattern[i];
if (char.IsDigit(c))
{
if (isUnique(c))
{
regex += "(.)(?!.*\\" + c + ")(?<!\\" + c + ".+)";
}
else
{
if (isFirstOccurrence(c, i))
regex += "(.)";
else
regex += "\\" + c;
}
}
else if (char.IsLetter(c))
regex += c + "";
else if (c == '?')
regex += ".";
}
regex += "$";
reg = new Regex(regex, RegexOptions.IgnoreCase);
}
private bool isUnique(char c)
{
return pattern.IndexOf(c) == pattern.LastIndexOf(c);
}
private bool isFirstOccurrence(char c, int i)
{
return pattern.IndexOf(c) == i;
}
public List<string> GetMatches()
{
return dictionary.FindAll(x => reg.IsMatch(x));
}
Thanks again for the awesome responses.