ansaurus

Question

uppercase random number of characters from a random alphanumeric string

Answer 1

+3 A:

The solution depends on the probabilistic model you choose. If for example you decide on binomial distribution, then you can traverse the chars, and switch every char to uppercase with a fixed probability p. The expected number of uppercase letters will be p * str.length():

public static String randomUpper(String str, double p) {
    StringBuilder sb = new StringBuilder(str.length());
    for (int i = 0; i < str.length(); i++) {
        char c = str.charAt(i);
        if (Character.isLetter(c) && Math.random() < p)
            c = Character.toUpperCase(c);
        sb.append(c);
    }
    return sb.toString();
}

If on the other hand you want to decide on the exact number of upercase letters for a given string, then the problem becomes a random sample problem (i.e. choose M positions to switch out of N positions in the string). This can be much faster than the first approach, when M is much smaller than N (though with Java's immutable strings the difference becomes minor because you have to copy the whole string anyway).

-- edit --

Now that you clarified the requirements, consider the following:

public static String randomUpper2(String str, double p) {
    int letters = 0;
    for (int i = 0; i < str.length(); i++) {
        if (Character.isLetter(str.charAt(i)))
            letters++;
    }

    int toChoose = (int) (p * letters);
    StringBuilder sb = new StringBuilder(str.length());
    for (int i = 0; i < str.length(); i++) {
        char c = str.charAt(i);
        if (Character.isLetter(c)) {
            if (Math.random() < (toChoose/(double)letters)) {
                c = Character.toUpperCase(c);
                toChoose--;
            }
            letters--;
        }           
        sb.append(c);
    }
    return sb.toString();
}

This code performs a random sample "on the fly", considering only alpha chars, as required. Use p=0.5 to switch exactly half of the letters.

Eyal Schneider 2010-07-20 18:20:45

+1. You beat me to suggesting `Character.toUpperCase()` as well.

Mark Peters 2010-07-20 18:31:47

As you traverse the string from front to end and the Match.random() compare value gets smaller (toChoose gets gets smaller), this results in later characters have a lower probability to be changed. Thus it is not a uniform distribution.

Nils Schmidt 2010-07-21 12:34:22

@Nils: this may seem so in the first sight, but this is a known algorithm, and it produces a uniform distriution on all possible subsets. Check out the proof in my blog post: http://eyalsch.wordpress.com/2010/04/01/random-sample/. See the section called "Full scan".

Eyal Schneider 2010-07-21 12:51:11

You are right, again :-)Thanks for the detailed explanation. Good blog post!

Nils Schmidt 2010-07-21 13:38:44

Answer 2

+2 A:

Here is the code snippet for random sample problem (thanks Eyal for naming it). Not sure if that is what you are looking for.

Be aware, that this solution would run into an infinete loop if not enough lowercase letters are in the string. So you would need to tackle that as well, but I guess it is a starting point. ;-)

String myString = "9aie3ra3nr23rr5r21t";
System.out.println(upperCaseRandom(myString, 10));


public static String upperCaseRandom(String input, int n) {
 StringBuilder output = new StringBuilder(input);
 Random r = new Random();

 for (int i = 0; i < n; i++) {
  // Pick a place
  int position = r.nextInt(input.length());

  // Check if lowercase alpha
  if (Character.isLowerCase(output.charAt(position))) {
   output.setCharAt(position, Character.toUpperCase(output.charAt(position)));
  } else {
   i--;
  } 
 } 
 return output.toString();
}

Edit: Here is an improved version. It does change exactly n lowercase letters into uppercase letters (if there are enough, otherwise it changes all of them). The programm does not run into infinite loops, but still running time is a problem though.

public static String upperCaseRandom(String input, int n) {
    final int length = input.length();
    final StringBuilder output = new StringBuilder(input);
    final boolean[] alreadyChecked = new boolean[length];
    final Random r = new Random();

    for (int i = 0, checks = 0; i < n && checks < length; i++) {
        // Pick a place
        int position = r.nextInt(length);

        // Check if lowercase alpha
        if (!alreadyChecked[position]) {
            if (Character.isLowerCase(output.charAt(position))) {
                output.setCharAt(position, Character.toUpperCase(output.charAt(position)));
            } else {
                i--;
            }
            checks++;
            alreadyChecked[position] = true;
        } else {
            i--;
        }
    }
    return output.toString();
}

Nils Schmidt 2010-07-20 18:30:03

This doesn't work correctly in all cases (try it). You may "switch" numeric chars, resulting in no effect. If you add an isLetter() condition the problem will be fixed, but this solution will have a poor performance on large strings, specially with many digits, due to the unbounded nature of the trial and error approach.

Eyal Schneider 2010-07-20 19:24:31

isLowerCase() will return false for a numeric char, thus it wont "switch" them. I cant recreate your wrong cases, but maybe I just did not understand so far. Could you recheck and/or maybe reexplain?The performance problems are indeed a problem, an improved version should not double check the same positions (just add an boolean[] to identify those. And still the infinite loop has to be avoided.

Nils Schmidt 2010-07-20 20:25:30

@Nils: Sorry, you are 100% right regarding the correctness of the code, my mistake. It is also shorter than my last solution. The only disadvantage is performance, in the asymptotic meaning. For converting all N chars, the expected number of trials is N/N+N/(N-1)+...+N/2+N/1 which behaves like N * log N. In other words, the worst case complexity is O(N log N) instead of O(N). This is a general problem with this particular random sample algorithm.

Eyal Schneider 2010-07-20 21:47:11

@Nils: I'm trying to cancel my downvote with no sucess... it says that the answer should be edited first. So I'll try to do a pseudo-edit first :)

Eyal Schneider 2010-07-20 21:49:59

ansaurus

tags:

views:

answers:

uppercase random number of characters from a random alphanumeric string

related questions