views:

75

answers:

1

Background: I'm developing a custom regex-like syntax for URL filenames. It will work like this:

  • User writes a pattern, something like "[a-z][0-9]{0,2}", and passes it as input
  • It is parsed by the program and translated into the set of permutations it represents i.e.
    'a', 'a0', 'a00' ... 'z99'

These patterns will vary in complexity, basically anything that could appear in a URL filename must be accommodated. The language is either Java or PHP, but examples in any language or abstract/conceptual help is more than welcome.

My questions are:

  1. Where to start with the implementation of a "parser" for the above

and less importantly,

  1. How to translate parsed complex patterns into strings programmatically
A: 

There is a good answer for this here: SO: /generate-all-permutations-of-text-from-a-regex-pattern-in-c

The crux of the thing is this...define what you really need well and figure out a way to halt once you have what you need and narrow your search range as much as possible because you are flirting with a quickly exploding number of permutations. "anything that could appear in a URL filename must be accommodated." is not going to cut it. For example, if you limit yourself to English characters and numbers, for a string 6 characters long you are looking at over 2 billion combinations. For each additional character multiply by 36.
If you go with ISO 8859 you get over 274 trillion combinations and Unicode over 745 trillion-trillion combinations.

Ichorus