ansaurus

Question

Finding numerical substrings mathematically, without string comparison.

Answer 1

A:

Umm, I'm probably totally misunderstanding the question, but.....

// Check if A is inside B lol
bool Contains (int a, int b)
{
    return (a <= b);
}

Unless you want to know if a particular sequence of numbers is within another sequence of numbers.

In that case, converting it to a string WILL be faster than doing the math to figure it out.

FlySwat 2008-10-23 23:24:18

Answer 2

A:

This in no way answers your question, whatsoever, but it's advice anyway :-)

The method name findMatch is not very descriptive. In this case, I'd have a static method ContainerBuilder.number(int), which returned a ContainerBuilder, which has the method contains on it. In this way your code becomes:

boolean b = number(12345).contains(234);

Juts some advice for the long run!

Oh yes, I meant to say also, you should define what you mean by "contains"

oxbow_lakes 2008-10-23 23:28:54

yes this code isn't something going into production, just something i whipped up real quick

Alex Beardsley 2008-10-23 23:31:06

Don't forget ContainerBuilderFactory and IBuiltContainer

FlySwat 2008-10-23 23:31:24

@Jonathan: I was about to write the same thing :)

abahgat 2008-10-23 23:34:57

Yes, well done. LOL! Just because, in my day job, I endlessly come across huge methods full of undocumented code called things like `findMatch` I'm totally unqualified to post a little (friendly) general advice. Oh and thanks for putting some suggestions in my mouth too!

oxbow_lakes 2008-10-23 23:41:03

Nalandial - no, this looks like a homework assignment. Hence the general advice for the future.

oxbow_lakes 2008-10-23 23:41:55

@Chris Marshall: it actually isn't. We ended up making this scenario not occur at all because it was a pain to deal with; however I thought about it more and more I actually just wanted to figure out if there was a good way to do it, just for the sake of interest.

Alex Beardsley 2008-10-23 23:48:12

Answer 3

+3 A:

The only optimization that I can think of is to do the conversion to string on your own and compare digits (right to left) as you do the conversion. First convert all the digits of b, then convert from the right on a until you find a match on the first digit of b (from right). Compare until all of b matches or you hit a mismatch. If you hit a mismatch, backtrack to the point where you starting matching the first digit of b, advance in a and start over.

IndexOf will have to do basically the same back tracking algorithm, except from the left. Depending on the actual numbers this may be faster. I think if the numbers are random, it should be since there should be many times when it doesn't have to convert all of a.

tvanfosson 2008-10-23 23:36:37

I was actually looking for a way to make the mathematical way of doing it faster than the String comparison, if such a way existed.As I said this has become more of a personal challenge on my own time and has gone away from what the project actually requires.

Alex Beardsley 2008-10-23 23:40:27

I like the idea, it saves some time, especially when a hit is probable. Plus, while accepted solution limits a and b to 16 digits, this one only limits smaller b in this way.

buti-oxa 2008-10-25 16:21:18

Answer 4

+10 A:

It should be faster string way, because your problem is textual, not mathematical. Notice that the your "contains" relationship says nothing about the numbers, it only says something about their decimal representations.

Notice also that the function you want to write will be unreadable - another developer will never understand what you are doing. (See what trouble you had with that here.) The string version, on the other hand, is perfectly clear.

buti-oxa 2008-10-23 23:50:42

then the challenge i'm proposing just for curiosity's sake: make the mathematical one faster! :D

Alex Beardsley 2008-10-24 00:00:09

oh i'm well aware that it's completely unreadable. again as i mentioned this is just for curiosity's sake and won't be going in any production code. just giving a chance for some people to geek out :P

Alex Beardsley 2008-10-24 16:52:50

I understand, I only pointed on one aspect, not answering the main quesion.As to the main question, tvanfosson pointed out the only hope I see to get extra speed: convert a to decimal only as far as you need. If match *is* there, you save some time.

buti-oxa 2008-10-25 16:18:01

Answer 5

A:

Is there any way to calculate this in binary? Obviously the binary value of an integer containing the binary integer of another character doesn't mean that the decical does the same. However, is there some kind of binary trickary that could be used? Maybe convert a numer like 12345 to 0001 0010 0011 0100 0101, and then do some bit shifting to figure out if 23 (0010 0011) is contained in there. Because your character set is only 10 characters, you could cut down the computation time by store 2 characters values in a single byte.

EDIT

Expanding on this idea a bit. if you have 2 integers, A and B, and want to know if A contains B, you check 2 things first. if A is less than B, then A cannot contain B. If A = B then A contains B. At this point you can convert them to strings*. If A contains the same number of character numbers as B, then A does not contain B, unless they are equal, but we wouldn't be here if they are equal, so if both strings are the same length, a does not contain b. At this point, the length of A will be longer than B. So, now you can convert the strings to their packed binary values as I noted in the first part of this post. Store these values in an array of integers. Now you do a bitwise AND Of the integer values in your array, and if the result is A, then A contains B. Now you shift the array of integers for B, to the left 4 bits, and do the conparison again. Do this until you start popping bits off the left of B.

*That * in the previous paragraph means you may be able to skip this step. There may be a way to do this without using strings at all. There might be some fancy binary trick you can do to get the packed binary representation I discussed in the first paragraph. There should be some binary trick you can use, or some quick math which will convert an integer to the decimal value I discussed before.

Kibbee 2008-10-23 23:55:56

i was thinking about this, but i honestly couldn't think of a way to do it.

Alex Beardsley 2008-10-24 00:13:23

Isn't this almost exactly what the "convert to string" plan does, albeit with a small factor memory savings since you make BCD strings instead of ASCII/UTF-8/UTF-16/whatever strings?

Doug McClean 2008-10-24 04:03:52

Answer 6

+2 A:

Looks like your function is actually doing pretty well, but an small improvement:

private static boolean findMatch(int a, int b) {
        if (b > a) return false;

        if (a == b) return true;

        int needleLength = getLength(b);

        int exponent = exponents[needleLength];
        int subNum;
        while (a > b) {
                subNum = a % exponent;

                if (subNum == b)
                        return true;

                a /= 10;
        }
        return false;
}

Just because once that a is smaller than b, is not worthy keeps looking, isnt it? Good luck and post if you find the solution!

David Santamaria 2008-10-23 23:59:16

that's actually a really good point, and it does optimize it quite a bit. nice catch!

Alex Beardsley 2008-10-24 00:02:40

running more tests, this is great for cases where b is fairly large. when b is large, the string method takes the same amount of time whereas the math one beats it by a landslide!

Alex Beardsley 2008-10-24 00:11:21

Answer 7

A:

FYI

http://refactormycode.com/

Could work for you.

David Santamaria 2008-10-24 00:00:59

Ugh! My personal guess is that the negative point is for SPAM? Anyway, I only recomended another tool (as Stackoverflow) that usually works, is that bad?

David Santamaria 2008-10-24 08:44:48

because it looks like SPAM. you didn't give enough information about what the link is. comments of the form "Hi [link] could work" look like spam to just about everyone.

John Gardner 2008-10-24 18:16:19

Answer 8

+2 A:

This is an interesting problem. Many of String.class's functions are actually native making beating String a difficult proposition. But here's some helpers:

TIP 1: Different simple integer operations have different speeds.

By quick calculations in sample programs showed:

% ~ T
* ~ 4T
/ ~ 7T

So you want to use as little division as possible in favor of multiplication or modulo. Not shown are subtraction, addition, and comparison operators cause they blow all of these out of the water. Also, using "final" as much as possible allows the JVM to do certain optimizations. Speeding up you "getLength" function:

private static int getLength(final int b) {        
   int len = 0;
   while (b > exponents[len]) {
       len++;
   }
   return len + 1
}

That gives about a 7x improvement in the function. You get an indexOutOfBounds exception if b > your max in exponents. To solve that, you can have:

private static int getLength(final int b) {        
   int len = 0;
   final int maxLen = exponents.length;
   while (len < maxLen && b > exponents[len]) {
       len++;
   }
   return len + 1;
}

That's slightly slower and gives you an incorrect length if b is too big, but it doesn't throw an exception.

TIP 2: Unnecessary object/primitive creation and method calls add to run time.

I'm guessing that "getLength" isn't called anywhere else, so while it might be nice to have a separate function, from a optimization standpoint its an unnecessary method call and creation of the object "len". We can put that code right where we use it.

private static boolean findMatch(int a, final int b) {
        if (b > a) return false;
        if (a == b) return true;
        int needleLength = 0;
        while (b > exponents[len]) {
            needleLength ++;
        }
        needleLength++;

        final int exponent = exponents[needleLength];
        int subNum;
        while (a >= 1 && a <= b) {
                subNum = a % exponent;
                if (subNum == b)
                        return true;
                a /= 10;
        }
        return false;
}

Also, note I changed the bottom while loop to also include "a <= b". I haven't tested that and not sure if the per-iteration penalty beats the fact that you don't waste any iterations. I'm sure there's a way to get rid of the division using clever math, but I can't think of it right now.

Laplie 2008-10-24 03:23:16

Answer 9

+4 A:

This is along Kibbee's line, but I got a little intrigued by this before he posted and worked this out:

long mask ( long n ) { 
    long m   = n % 10;
    long n_d = n;
    long div = 10;
    int  shl = 0;
    while ( n_d >= 10 ) { 
        n_d /= 10;
        long t = n_d % 10;
        m |= ( t << ( shl += 4 ));
    }
    return m;
}

boolean findMatch( int a, int b ) { 
    if ( b < a  ) return false;
    if ( a == b ) return true;

    long m_a = mask( a );    // set up mask O(n)
    long m_b = mask( b );    // set up mask O(m)

    while ( m_a < m_b ) {
        if (( m_a & m_b ) == m_a ) return true;
        m_a <<= 4;  // shift - fast!
        if ( m_a == m_b ) return true;
    }  // O(p)
    return false;
}       

void testContains( int a, int b ) { 
    print( "findMatch( " + a + ", " + b + " )=" + findMatch( a, b ));
}

testContains( 12, 120 );
testContains( 12, 125 );
testContains( 123, 551241238 );
testContains( 131, 1214124 );
testContains( 131, 1314124 );

Since 300 characters is far too little to make an argument in, I'm editing this main post to respond to Pyrolistical.

Unlike the OP, I wasn't that surprised that a native compiled indexOf was faster than Java code with primitives. So my goal was not to find something I thought was faster than a native method called zillions of times all over Java code.

The OP made it clear that this was not a production problem and more along the lines of an idle curiosity, so my answer solves that curiosity. My guess was that speed was an issue, when he was trying to solve it in production, but as an idle curiosity, "This method will be called millions and millions of times" no longer applies. As he had to explain to one poster, it's no longer pursued as production code, so the complexity no longer matters.

Plus it provides the only implementation on the page that manages to find the "123" in "551241238", so unless correctness is an extraneous concern, it provides that. Also the solution space of "an algorithm that solves the problem mathematically using Java primitives but beats optimized native code" might be EMPTY.

Plus, it's not clear from your comment whether or not you compared apples to apples. The functional spec is f( int, int )-> boolean, not f( String, String )-> boolean (which is kind of the domain of indexOf) . So unless you tested something like this (which could still beat mine, and I wouldn't be awfully surprised.) the additional overhead might eat up some of that excess 40%.

boolean findMatch( int a, int b ) { 
    String s_a = "" + a;
    String s_b = "" + b;
    return s_a.indexOf( s_b ) > -1;
}

It does the same basic steps. log₁₀( a ) encoding + log₁₀( b ) encoding + actually finding the match, which is as well O(n) where n is the largest logarithm.

Axeman 2008-10-24 05:28:00

when changed to ints, this is on par with the string function, and i think thats as good as its going to get.

Alex Beardsley 2008-10-24 15:05:03

Yeah, I thought about this method later on, but didn't take the time to actually program it all out.

Kibbee 2008-10-24 16:27:58

Did you profile it? My quick test showed that this method is about 40% slower than the string.contains method.

Pyrolistical 2008-10-28 21:23:11

Pyrolistical, it qualifies for "Finding numerical substrings mathematically, without string comparison", whether it's faster than native code is another matter altogether.

Axeman 2008-10-29 06:19:56

If you read the question and Nalandial's comments, you'll see he is looking for a faster replacement. So yes, the correct answer needs to be faster not just be correct.

Pyrolistical 2008-10-29 17:30:09

I also read: "I'm really interested to see or hear anything anyone has to offer on this." See the edited post.

Axeman 2008-10-31 04:44:52

Answer 10

A:

Can I ask where you're using this function in your code? Maybe there's another way to solve the problem it is currently solving which would be much faster. This could be like when my friend asked me to completely re-tune his guitar, and I did it before realizing I could have just lowered the bottom string by a whole step and gotten an equivalent result.

Claudiu 2008-10-24 08:30:52

Answer 11

A:

Just curious: why in the world would your want to know whether the decimal string representation of a number contains the decimal string representation of another?

Hemal Pandya 2008-10-24 14:33:23

originally it was because the client wants to be able to search on a unique id, but they want to have a "contains" searchability like the one described here. however it also needs to have > and < functionality, and > on a string != > on a number.

Alex Beardsley 2008-10-24 15:01:56

Thank you for the explanation, makes perfect sense.

Hemal Pandya 2008-10-25 01:42:03

ansaurus

tags:

views:

answers:

Finding numerical substrings mathematically, without string comparison.

related questions