views:

50

answers:

2

Say for example I want to take this phrase:

{{Hello|What's Up|Howdy} {world|planet} | {Goodbye|Later} {people|citizens|inhabitants}}

and randomly make it into one of the following:

Hello world
Goodbye people
What's Up word
What's Up planet
Later citizens
etc.

The basic idea is that enclosed within every pair of braces will be an unlimited number of choices separated by "|". The program needs to go through and randomly choose one choice for each set of braces. Keep in mind that braces can be nested endlessly within each other. I found a thread about this and tried to convert it to Java, but it did not work. Here is the python code that supposedly worked:

import re
from random import randint

def select(m):
    choices = m.group(1).split('|')
    return choices[randint(0, len(choices)-1)]

def spinner(s):
    r = re.compile('{([^{}]*)}')
    while True:
        s, n = r.subn(select, s)
        if n == 0: break
    return s.strip()

Here is my attempt to convert that Python code to Java.

public String generateSpun(String text){
    String spun = new String(text);
    Pattern reg = Pattern.compile("{([^{}]*)}");
    Matcher matcher = reg.matcher(spun);
    while (matcher.find()){
       spun = matcher.replaceFirst(select(matcher.group()));
    }
    return spun;
}

private String select(String m){
    String[] choices = m.split("|");
    Random random = new Random();
    int index = random.nextInt(choices.length - 1);
    return choices[index];
}

Unfortunately, when I try to test this by calling

generateAd("{{Hello|What's Up|Howdy} {world|planet} | {Goodbye|Later} {people|citizens|inhabitants}}");

In the main of my program, it gives me an error in the line in generateSpun where Pattern reg is declared, giving me a PatternSyntaxException.

java.util.regex.PatternSyntaxException: Illegal repetition
{([^{}]*)}

Can someone try to create a Java method that will do what I am trying to do?

A: 

To fix the regex, add backslashes before the outer { and }. These are meta-characters in Java regexes. However, I don't think that will result in a working program. You are modifying the variable spun after it has been bound to the regex, and I do not think the returned Matcher will reflect the updated value.

I also don't think the python code will work for nested choices. Have you actually tried the python code? You say it "supposedly works", but it would be wise to verify that before you spend a lot of time porting it to Java.

Jim Garrison
Yes, I tested the python code and it works as it is supposed to.
Dylan
but you are right, my version of the code does not work
Dylan
A: 

Here are some of the problems with your current code:

  • You should reuse your compiled Pattern, instead of Pattern.compile every time
  • You should reuse your Random, instead of new Random every time
  • Be aware that String.split is regex-based, so you must split("\\|")
  • Be aware that curly braces in Java regex must be escaped to match literally, so Pattern.compile("\\{([^{}]*)\\}");
  • You should query group(1), not group() which defaults to group 0
  • You're using replaceFirst wrong, look up Matcher.appendReplacement/Tail instead
  • Random.nextInt(int n) has exclusive upper bound (like many such methods in Java)
  • The algorithm itself actually does not handle arbitrarily nested braces properly

Note that escaping is done by preceding with \, and as a Java string literal it needs to be doubled (i.e. "\\" contains a single character, the backslash).

Attachment

polygenelubricants
Oh there was also `String spun = new String(text);` which is just silly (see http://stackoverflow.com/questions/3370184/java-for-string-objects-ceased-to-work/3370245#3370245)
polygenelubricants
Thanks for all the help. Yeah, I need to fix that algorithm though. It's weird because it is the same regex as the python code, and the python code worked.
Dylan
@Dylan: Different flavors of regex have different specifications, hence why you should specify flavor when asking regex question. The `{` and `}` are repetition metacharacters in regex (e.g. `x{3}` matches `www`), but apparently Python lets them appear unescaped depending on context. Java regex engine differs from Python at least in this aspect, among many others.
polygenelubricants
I couldn't figure out the algorithm to spin the text in one go, so I just made the method recursive until it becomes a string with no curly braces or |'s
Dylan
@Dylan: Recursion is natural since nesting braces imply recursive structure. Regex can be tricky here since it can't match balanced parantheses (without some special supplemental features, e.g. Perl/C#).
polygenelubricants