views:

196

answers:

2

I have some random string with unknown content, what is known is that the content is alphanumeric and in lower case.

I am looking for a simple method to upper case a random number of the alpha characters in that string. The higher the randomness the better.

I can think of a few ways to do this, but none of them seem very optimal.

alright first solution:

public String randomizeCase(String myString){
  Random rand = new Random();
  StringBuilder build = new StringBuilder();
  for(char c: myString.toCharArray()){
     String s = new String(c);
     if(Character.isLetter(c) && rand.nextBoolean()){
        s = s.toUpperCase();
     } 
     build.append(s);
  }
  return build.toString();
}

I dont like this solution because:

  • 50% chance that every char is uppercased does not equal 50% chance that 50% of the chars are uppercased
  • There is a chance that nothing is upped cased
  • char to string conversion is ugly
+3  A: 

The solution depends on the probabilistic model you choose. If for example you decide on binomial distribution, then you can traverse the chars, and switch every char to uppercase with a fixed probability p. The expected number of uppercase letters will be p * str.length():

public static String randomUpper(String str, double p) {
    StringBuilder sb = new StringBuilder(str.length());
    for (int i = 0; i < str.length(); i++) {
        char c = str.charAt(i);
        if (Character.isLetter(c) && Math.random() < p)
            c = Character.toUpperCase(c);
        sb.append(c);
    }
    return sb.toString();
}

If on the other hand you want to decide on the exact number of upercase letters for a given string, then the problem becomes a random sample problem (i.e. choose M positions to switch out of N positions in the string). This can be much faster than the first approach, when M is much smaller than N (though with Java's immutable strings the difference becomes minor because you have to copy the whole string anyway).

-- edit --

Now that you clarified the requirements, consider the following:

public static String randomUpper2(String str, double p) {
    int letters = 0;
    for (int i = 0; i < str.length(); i++) {
        if (Character.isLetter(str.charAt(i)))
            letters++;
    }

    int toChoose = (int) (p * letters);
    StringBuilder sb = new StringBuilder(str.length());
    for (int i = 0; i < str.length(); i++) {
        char c = str.charAt(i);
        if (Character.isLetter(c)) {
            if (Math.random() < (toChoose/(double)letters)) {
                c = Character.toUpperCase(c);
                toChoose--;
            }
            letters--;
        }           
        sb.append(c);
    }
    return sb.toString();
}

This code performs a random sample "on the fly", considering only alpha chars, as required. Use p=0.5 to switch exactly half of the letters.

Eyal Schneider
+1. You beat me to suggesting `Character.toUpperCase()` as well.
Mark Peters
As you traverse the string from front to end and the Match.random() compare value gets smaller (toChoose gets gets smaller), this results in later characters have a lower probability to be changed. Thus it is not a uniform distribution.
Nils Schmidt
@Nils: this may seem so in the first sight, but this is a known algorithm, and it produces a uniform distriution on all possible subsets. Check out the proof in my blog post: http://eyalsch.wordpress.com/2010/04/01/random-sample/. See the section called "Full scan".
Eyal Schneider
You are right, again :-)Thanks for the detailed explanation. Good blog post!
Nils Schmidt
+2  A: 

Here is the code snippet for random sample problem (thanks Eyal for naming it). Not sure if that is what you are looking for.

Be aware, that this solution would run into an infinete loop if not enough lowercase letters are in the string. So you would need to tackle that as well, but I guess it is a starting point. ;-)

String myString = "9aie3ra3nr23rr5r21t";
System.out.println(upperCaseRandom(myString, 10));


public static String upperCaseRandom(String input, int n) {
 StringBuilder output = new StringBuilder(input);
 Random r = new Random();

 for (int i = 0; i < n; i++) {
  // Pick a place
  int position = r.nextInt(input.length());

  // Check if lowercase alpha
  if (Character.isLowerCase(output.charAt(position))) {
   output.setCharAt(position, Character.toUpperCase(output.charAt(position)));
  } else {
   i--;
  } 
 } 
 return output.toString();
}

Edit: Here is an improved version. It does change exactly n lowercase letters into uppercase letters (if there are enough, otherwise it changes all of them). The programm does not run into infinite loops, but still running time is a problem though.

public static String upperCaseRandom(String input, int n) {
    final int length = input.length();
    final StringBuilder output = new StringBuilder(input);
    final boolean[] alreadyChecked = new boolean[length];
    final Random r = new Random();

    for (int i = 0, checks = 0; i < n && checks < length; i++) {
        // Pick a place
        int position = r.nextInt(length);

        // Check if lowercase alpha
        if (!alreadyChecked[position]) {
            if (Character.isLowerCase(output.charAt(position))) {
                output.setCharAt(position, Character.toUpperCase(output.charAt(position)));
            } else {
                i--;
            }
            checks++;
            alreadyChecked[position] = true;
        } else {
            i--;
        }
    }
    return output.toString();
}
Nils Schmidt
This doesn't work correctly in all cases (try it). You may "switch" numeric chars, resulting in no effect. If you add an isLetter() condition the problem will be fixed, but this solution will have a poor performance on large strings, specially with many digits, due to the unbounded nature of the trial and error approach.
Eyal Schneider
isLowerCase() will return false for a numeric char, thus it wont "switch" them. I cant recreate your wrong cases, but maybe I just did not understand so far. Could you recheck and/or maybe reexplain?The performance problems are indeed a problem, an improved version should not double check the same positions (just add an boolean[] to identify those. And still the infinite loop has to be avoided.
Nils Schmidt
@Nils: Sorry, you are 100% right regarding the correctness of the code, my mistake. It is also shorter than my last solution. The only disadvantage is performance, in the asymptotic meaning. For converting all N chars, the expected number of trials is N/N+N/(N-1)+...+N/2+N/1 which behaves like N * log N. In other words, the worst case complexity is O(N log N) instead of O(N). This is a general problem with this particular random sample algorithm.
Eyal Schneider
@Nils: I'm trying to cancel my downvote with no sucess... it says that the answer should be edited first. So I'll try to do a pseudo-edit first :)
Eyal Schneider