views:

1079

answers:

12

It takes about minute to achieve 3000 in my comp but I need to know the millionth number in the series. The definition is recursive so I cannot see any shortcuts except to calculate everything before the millionth number. How can you fast calculate millionth number in the series?

Series Def

n_{i+1} = \floor{ 3/2 * n_{i} } and n_{0}=2.

Interestingly, only one site list the series according to Google: this one.

Too slow Bash code

#!/bin/bash

function series 
{
        n=$( echo "3/2*$n" | bc -l | tr '\n' ' ' | sed -e 's@\\@@g' -e 's@ @@g' );
                                        # bc gives \ at very large numbers, sed-tr for it
        n=$( echo $n/1 | bc )           #DUMMY FLOOR func
}

n=2
nth=1

while [ true ]; #$nth -lt 500 ];
do
        series $n                        # n gets new value in the function through global value
        echo $nth $n
        nth=$( echo $nth + 1 | bc )     #n++
done
+3  A: 

Hmm, bash is not what I'd be using for high speed numerical processing. Get yourself a copy of GMP and put together a C program to do it.

There may well be a mathematical formula to give it to you quickly but, in the time it takes you to figure it out, GMP could probably throw you the result :-)

paxdiablo
I _highly_ doubts he _needs_ C... C is for ultra high performance massive scale computations. Java or even Python should be fine for OP's problem. No need to waste time playing with pointers and undefined behaviour.
Longpoke
@Longpoke, you wouldn't be playing with either of those things. With GMP, you're looking at about 10 lines of code to figure it out. I chose C because that's what I use for GMP work, I have no idea if it has Python or Java bindings.
paxdiablo
@Longpoke: this is a very easy formula - it should be no problem to program it in C ... even without *undefined behaviour* ...
tanascius
@Longpoke The program for this calculation will be short (can probably fit in 10 lines or so), so C/C++ will be fine, as long as OP uses good bignum library. WIthout bignums there will be precision loss.
SigTerm
@SigTerm: GMP is such a library.
Billy ONeal
@Longpoke: Java is faster than C, for real programs. It sometimes performs worse for microbenchmarks though. It used to be slower, though.
Vuntic
There's no way that the JVM and garbage collection and heap allocation could possibly perform better than straight stack-only C, as is necessary for the algorithm in question.However, there's really no reason to use C either over Java in this case, the minutae of performance is not going to be a big deal compared to the miles he'll get moving from Bash.
DeadMG
@paxdiablo @tanascius @SigTerm, he will have to include the bignum headers and link to bignum. This alone will take hours to learn if you have no experience with binary linkage. The after that, he can declare a pointer to a bignum object, which will mean he has to learn pointers; a few more hours gone. Then finally, he will have to find a way to print the bignum, and wonder why it's not just working with `printf`. **Every** popular programming language has builtin bignums or a library for bignums.
Longpoke
@DeadMG: **NO**. The stack is not a magic godsend of performance. In x86, to allocate objects on the stack, you have to subtract the total stack allocation from `esp`, _then_, fill each slot of the stack with what you want to put. This is the same as keeping a chunk of heap around and subtracting the head pointer to it and then filling each slot. Either way you are going to be using `REG+offset = value` to set each heap item, and the occasional `rep movs`. The JVM will _adaptively_ decide when it's a good idea to do this. @Vuntic: Indeed.
Longpoke
@Longpoke, I'm not sure I understand your arguments. Hours of learning is probably what it will take to learn _any_ language. And you don't learn pointers for GMP, you simply define and init the variable as you would any other/ Yes, GMP uses pointers under the hood, but to the user, it's just like "int a; a = 7;" But in all honesty, it makes little difference whether you use Java or C or any other language (with bignum support), just don't use bash - it's not meant for this sort of stuff.
paxdiablo
+8  A: 

You almost found it. Next time, check out the Online Encyclopedia of Integer Series.

Here's the entry:

http://www.research.att.com/~njas/sequences/A061418

     FORMULA    

a(n) = ceiling[K*(3/2)^n] where K=1.08151366859...

The constant K is 2/3*K(3) (see A083286). - Ralf Stephan, May 29, 2003 

That said:

>>> def f(n):
...     K = 1.08151366859
...     return int(ceil(K*(1.5)**n))

Sanity test:

>>> for x in range(1, 10):
...     print f(x)
...     
2
3
4
6
9
13
19
28
42

Awesome! Now how about 1000000:

>>> f(1000000)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<input>", line 3, in f
OverflowError: (34, 'Result too large')

Well, I tried. :] But you get the idea.

Edit again: Solution is found! See Timo or Lasse V. Karlsen's answers.

Edit: Using Timo's bit-shifting idea:

import gmpy
n=gmpy.mpz(2)
for _ in xrange(10**6-1):
    n+=n>>1
print(n)

yields

1963756763...226123087 (176092 digits)

% time test.py > test.out

real    0m21.735s
user    0m21.709s
sys 0m0.024s
Xavier Ho
This is not accurate enough. Try my Java program. With `N=100K`, it gives `144523203...2773508` in about 5 seconds.
polygenelubricants
Yep, that K is far too approximate (you'll need many, many more digits of it).
Alex Martelli
It's A061418 rather than A061419
Dennis Williamson
The formula you show is for the wrong series number, too.
Dennis Williamson
Thanks Dennis. I actually edited it before, looks like I edited it the wrong way. || Also, I didn't do the GMPY part; can the original author continue on that endeavour?
Xavier Ho
@Dennis: Hm, I looked again, it **is** the right formula. I'll edit out the unnecessary relation to the other series, I suppose.
Xavier Ho
Sorry, I misread they way they were referring to each other at that link.
Dennis Williamson
A: 

This is almost a first order recurrence relation, except for the floor, which messes things up. If you didn't want the floor,

http://en.wikipedia.org/wiki/Recurrence_relation

Also, don't use bash.

Joe
I don't know who is doing all these downvoting, but these posts don't deserve them. They are not incorrect.
Xavier Ho
+2  A: 

This is identified as sequence A061418 in the sequences site (AKA "The On-Line Encyclopedia of Integer Sequences"); per the relevant page,

FORMULA a(n) =A061419(n)+1 = ceiling[K*(3/2)^n] where K=1.08151366859... The constant K is 2/3*K(3) (see A083286).

and with a suitable high-precision library (GMP as already suggested, or MPIR, and maybe a wrapper on top like my baby gmpy is for Python) you can used the closed-form formula for much speedier computation of "the millionth item in the series" and the like.

It's often possible to put recursively specified recurrences into closed formulas. For an extensive beginner's introduction to the subject, Concrete Mathematics (by Graham, Knuth and Patashnik) is really hard to beat.

Alex Martelli
This reformulation doesn't help, because now you'd have to compute K or K(3).
polygenelubricants
+5  A: 

I used the following Java program:

import java.math.*;

public class Series {
    public static void main(String[] args) {
        BigInteger num = BigInteger.valueOf(2);
        final int N = 1000000;

        long t = System.currentTimeMillis();
        for (int i = 1; i < N; i++) {
            num = num.shiftLeft(1).add(num).shiftRight(1);
        }
        System.out.println(System.currentTimeMillis() - t);
        System.out.println(num);
    }
}

The output, cropped: (full output on pastebin)

516380 (milliseconds)
196375676351034182442....29226123087

So it took about 8.5 minutes on my modest machine. I used -Xmx128M, but not sure if that was really necessary.

There are probably better algorithms out there, but this took a total of 10 minutes, including writing the naive implementation and running the program.

Sample runs

polygenelubricants
You don't need much memory, one number is at most slightly larger than 64KB (584964 bits). Shifting and adding is faster by a factor of six compared to multiplying by three then dividing by two.
starblue
++ I did the same in Mathematica (one liner :p) and it took 16 seconds. Also I verified your result, it's the same as the one I get.
Timo
A: 

The recursive formulation is going to take quite a long time under most curcumstances because it has to maintain the machine stack. Why not use dynamic programming instead?

i.e. (pseudocode)

bignum x = 2
for (int i = 1; i < 1000000; i++) {
    x = floor(3.0/2*x)
}

Of course, for a meaningful result you'll need a high precision number library.

Billy ONeal
+9  A: 

The reason your script is so slow is that it's spawning bc three times, tr once and sed once in a loop.

Rewrite the whole thing in bc and do the sed at the end. My test shows that the bc-only version is over 600 times faster. It took just under 16 minutes on an old slow system for the bc version to find the 100,000th value (only printing the last).

Also, note that your "floor" function is actually "int".

#!/usr/bin/bc -lq
define int(n) {
    auto oscale
    oscale = scale
    scale = 0
    n = n/1
    scale = oscale
    return n
}

define series(n) {
    return int(3/2*n)
}

n = 2
end = 1000
for (count = 1; count < end; count++ ) {
    n = series(n)
}
print count, "\t", n, "\n"
quit

Note that print is an extension and some versions of bc may not have it. If so just reference the variable by itself and its value should be output.

Now you can do chmod +x series.bc and call it like this:

./series.bc | tr -d '\n' | sed 's.\\..g'
Dennis Williamson
+1  A: 

You can probably get a bit closer by using a more suitable language, e.g., Scheme:

(define (series n) (if (= n 0) 2
                       (quotient (* 3 (series (- n 1))) 2)))

This computes the 17610 digits of (series 100000) in about 8 seconds on my machine. Unfortunately, (series 1000000) still takes far too long for even a much newer/faster machine to have any hope of finishing in a minute.

Switching to C++ with a big-integer library (NTL, in this case):

#include <NTL/zz.h>
#include <fstream>
#include <time.h>
#include <iostream>

int main(int argc, char **argv) {
    NTL::ZZ sum;

    clock_t start = clock();
    for (int i=0; i<1000000; i++) 
        sum = (sum*3)/2;
    clock_t finish = clock();
    std::cout << "computation took: " << double(finish-start)/CLOCKS_PER_SEC << " seconds.\n";

    std::ofstream out("series_out.txt");
    out << sum;

    return 0;
}

This computes the series for 1000000 in 4 minutes, 35 seconds on my machine. That's fast enough that I can almost believe a really fast, new machine might at least come close to finishing in a minute (and yes, I checked what happened when I used shifts instead of multiplication/division -- it was slower).

Unfortunately, the closed-form computation others have suggested seems to be of little help. To use it you need to compute the constant K to sufficient precision. I don't see a closed-form computation of K, so this really just shifts the iteration to computing K, and it looks like computing K to sufficient precision is little (if any) faster that computing the original series.

Jerry Coffin
There are some inconsistencies in your timings and count values. For example, "This computes the series for 100000 in 4 minutes, 35 seconds on my machine." and "`for (int i=0; i<1000000; i++)`" versus "This [scheme version] computes the 17610 digits of (series 100000) in about 8 seconds on my machine.", etc. Perhaps it's merely a dropped zero in "100000 in 4 minutes, ..."
Dennis Williamson
Can you get the command-not-run-in-Emacs to run in Scheme? http://stackoverflow.com/questions/2840593/how-do-i-find-the-millionth-number-in-the-series-2-3-4-6-9-13-19-28-42-63/2844766#2844766
HH
@Dennis: Yes, it was a typo, which I've fixed. Thanks for pointing it out.
Jerry Coffin
@HH: It could be translated into Scheme, but the result wouldn't look very similar -- virtually none of what you have would work as-is in Scheme (though I suppose you make it work by writing other code to define what you've used).
Jerry Coffin
+4  A: 

Here's a Python version that on my 10-year old laptop takes about 220 seconds to run:

import math;
import timeit;

def code():
  n = 2
  nth = 1

  while nth < 1000000:
    n = (n * 3) >> 1
    nth = nth + 1

  print(n);

t = timeit.Timer(setup="from __main__ import code", stmt="code()");
print(t.timeit(1));

It produces the same result that this answer has on the pastebin (that is, I verified the start and end of it, not everything.)

Lasse V. Karlsen
+10  A: 

You can easily solve this by thinking about the problem in binary. Floor(3/2*i) is basically shift right, truncate and add. In pseudo code:

0b_n[0]   = 10              // the start is 2
0b_n[1]   = 10 + 1(0) = 11  // shift right, chop one bit off and add 
0b_n[i+1] = 0b_n[i] + Truncate(ShiftRight(0b_n[i]))

This should be quite fast to implement in any form.

I just made an implementation of this in Mathematica and it seems that the BitShiftRight operation also chops off the bit past the unit position, so that gets taken care of automatically. Here is the one liner:

In[1] := Timing[num = Nest[(BitShiftRight[#] + #) &, 2, 999999];]
Out[2] = {16.6022, Null}

16 seconds, the number prints just fine, it is quite long though:

In[2] := IntegerLength[num]
Out[2] = 176092

In[3] := num
Out[3] = 1963756...123087

Full result here.

Timo
@Timo: Very interesting. Thanks. +1
Xavier Ho
This is a good problem for illustrating the difference between iterative and recursive solutions (iterative == faster), http://en.wikipedia.org/wiki/Recursion_(computer_science)#Recursion_versus_iteration . Also for showing the superiority of direct bit operations vs base 10 arithmetic.
Timo
@Timo, can you give me a link or some pointers to learn more about bit operations? I would have never associated `floor()` with them.
Xavier Ho
I suggest looking at wikipedia for binary numbers. The trick here is to know that shifting the whole binary number one place to the left or right equals multiplication by two or division by two http://en.wikipedia.org/wiki/Bitwise_operation#Arithmetic_shift . E.g., binary 3 is 11, shift left (pad with a zero) -> 110 which is binary 6, shift right (drop the right most digit) -> 1 which is binary 1. Now the `floor()` part becomes obvious. Binary numbers are inherently integers so the rounding from floor() is automatic. The way normal Real numbers are presented in binary is then another story.
Timo
So it follows from this that normal multiplication can be constructed from these more elementary ones plus addition and subsraction. E.g. num x 7 == num x 8 - num, two operations (x8 is shift left by three places and usually can be done in one operation) instead of six additions (num + num + ...).
Timo
@Timo: I don't have Mathematica licence. Any other way of doing such BigInteger calculations?
Lazer
There ia a very powerful Open alternative for Mathematica called Sage http://www.sagemath.org/ which I'm pretty sure could handle this calculation in a similar way.
Timo
@Timo : thanks!
Lazer
A: 

I converted Timo's ideas to elisp. It fails with 100, giving negative number. FAIL, please, see no BigNums!

(progn
  (let ((a 2)
        (dummy 0))
    (while (< dummy 100)
      (setq a (+ a (lsh a -1)))
      (setq dummy (1+ dummy)))
    (message "%d" a)))
-211190189 #WRONG evalution
HH
A: 

It's very easy to do in Pari:

n=2;for(k=1,10^6,n+=n>>1)

This takes 14 seconds on my machine. There are faster ways, of course -- GMP comes to mind -- but why bother? You won't be able to shave more than 10 seconds off the runtime, and development time would be on the order of minutes. :)

Minor point: It's ambiguous in the original formulation whether the millionth term n999999 is desired or n1000000, the number with index one million; I give the latter since I see the former is already calculated above.

Charles