tags:

views:

569

answers:

3

I would like to write a script that takes one argument that might look like this:

abc(ag)de*

a, b, c are literal characters.

(ag) means "an 'a' or a 'g'".

* means any one letter or number.

I want the script to create an Array of all the possible strings the input could represent. (The purpose is to check if they're available domain names.)

The input could also be something like abc(ag)de(mnlop) where there are more than on character class.

Seems like the first task is to split it into an Array or Arrays, so the first example would be...

[
  ['a'],
  ['b'],
  ['c'],
  ['a', 'g'],
  ['d'],
  ['e'],
  [
    'a', 'b', 'c', 'd', 'e', 'f', 'g',
    # etc...
  ]
]

This is where I get stuck. I don't know how to split it up into pieces like that.

Any suggestions on how to approach it?

+1  A: 

If your * means only a single character, then I guess this is at least solvable. If it means "zero or more of any character", then it feels as if your solution space is bordering on the infinite, and thus will be kind of hard to return as an actual concrete value.

I think I would approach this by somehow splitting out the variable parts, figuring out how many variants each supports, then (conceptually) looping over all the variables in a nested way, forming one output string for each iteration of the innermost loop.

For the example string of "abc(ag)de*", this would boil down to this (Python-ish pseudo-code, my Ruby is not up for public use):

results = []
for x in "ag":
  for y in "abcdefghijklmnopqrstuvwxyz":
    results.append("abc%sde%s" % (x, y))

The %s in the string on the final line is a format specifier, s simply means "string", and will cause the corresponding value from tuple to the right of the % operator after the string to be interpolated at that position.

unwind
The star means just one character.
Ethan
+5  A: 

Here's a pretty compact solution. It's in no way optimized for performance which puts some constraints on the patterns you supply e.g. too many wildcards is probably not the best idea.

Here's the code

input1 = "abc(ag)de*"
input2 = "abc(ag)de(mnlop)"

class Array
  def append_suffixes!(suffixes)
    self.replace suffixes.map { |a| self.map { |p| p + a }}.flatten
  end
end

def generate_combinations(pattern)
  combinations = [""]
  pattern.scan(/\(([^)]+)\)|(\*)|(\w+)/) do |group,wildcard,other|
    new_suffixes = case
      when group    : group.split('')
      when wildcard : [*'a'..'z']
      when other    : other
      else raise "Unknown match!"
    end
    combinations.append_suffixes! new_suffixes
  end
  combinations
end

p generate_combinations(input1)
p generate_combinations(input2)
p generate_combinations("**").size

The output from running the code above is (slightly edited):

["abcadea", "abcgdea", "abcadeb", "abcgdeb", "abcadec", 
 "abcgdec", "abcaded", "abcgded", "abcadee", "abcgdee", 
 "abcadef", "abcgdef", "abcadeg", "abcgdeg", "abcadeh", 
 "abcgdeh", "abcadei", "abcgdei", "abcadej", "abcgdej", 
 "abcadek", "abcgdek", "abcadel", "abcgdel", "abcadem", 
 "abcgdem", "abcaden", "abcgden", "abcadeo", "abcgdeo", 
 "abcadep", "abcgdep", "abcadeq", "abcgdeq", "abcader", 
 "abcgder", "abcades", "abcgdes", "abcadet", "abcgdet", 
 "abcadeu", "abcgdeu", "abcadev", "abcgdev", "abcadew", 
 "abcgdew", "abcadex", "abcgdex", "abcadey", "abcgdey", 
 "abcadez", "abcgdez"]

["abcadem", "abcgdem", "abcaden", "abcgden", "abcadel", 
 "abcgdel", "abcadeo", "abcgdeo", "abcadep", "abcgdep"]

676 # The number of two letter words i.e. 26*26

Please feel free to ask if you have any questions about the code above.

sris
+1  A: 

What you're essentially asking for is to take a regexp and generate all the strings it matches.

That was Ruby Quiz #143. Have a look at the solutions on the left hand side.

Sarah Mei