tags:

views:

618

answers:

5

How can I identify strings containing more digits than non-digits using regular expression (Pattern) in Java? Thank you.

+12  A: 

That's not a regular language, and thus it cannot be captured by a vanilla regex. It may be possible anyway, but it will almost certainly be easier not to use a regex:

public static boolean moreDigitsThanNonDigits(String s) {
    int diff = 0;
    for(int i = 0; i < s.length(); ++i) {
        if(Character.isDigit(s.charAt(i))) ++diff;
        else --diff;
    }
    return diff > 0;
}
Dave
A: 

I'm not sure that using regular expressions would be the best solution here.

therefromhere
I do not insist on using regular expression, I need to identify those strings somehow.
+9  A: 

You won't be able to write a regexp that does this. But you already said you're using Java, why not mix in a little code?

public boolean moreDigitsThanNonDigits(String input) {
    String nonDigits = input.replace("[0-9]","");
    return input.length() > (nonDigits.length * 2);
}
waxwing
Hi, Can you please clarify my doubt, by using java.util.regex pacakgae, will I be able to search for the any kind of pattern in the text files or in any kind of file format?
harigm
A: 

regex alone can't (since they don't count anything); but if you want to use them then just use two replacements: one that strips out all the digits and one that only keeps them. then compare string lengths of the results.

of course, i'd rather use Dave's answer.

Javier
Hi, Can you please clarify my doubt, by using java.util.regex pacakgae, will I be able to search for the any kind of pattern in the text files or in any kind of file format?
harigm
Since regular expressions are used for comparing the patterns in a string, Then my doubt whether google searches the patterns concept to search in all the files?
harigm
+2  A: 

Regular expressions are conceptually not able to preform such a task. They are equivalent to formal languages or (regular) automatons. They have no notion of memory (or a stack), so they cannot count the occurences of symbols. The next extension in terms of expressiveness are push-down automatons (or stack machines), which correspond to context free grammars. Before writing such a grammer for this task, using a method like the moreDigitsThanNonDigits above would be appropriate.

The MYYN
Perl- (and Java-) style regular expressions are actually more powerful than regular languages, because of the "\number" syntax for backtracking on a captured group. They can recognize languages that are not regular. For example, the language of any string repeated twice (which is not regular, nor even context-free) can be recognized by "(.*)\1".
newacct
Thanks for pointing this out! Your example would be "(.*)\1\1", right? But length comparisons are still not possible, I would assume.
The MYYN