Given a large N, I need to iterate through all phi(k) such that 1 < k < N quickly. Since the values of N will be around 1012, it is important that the memory complexity is sub O(n).
Is it possible? If so, how?
Given a large N, I need to iterate through all phi(k) such that 1 < k < N quickly. Since the values of N will be around 1012, it is important that the memory complexity is sub O(n).
Is it possible? If so, how?
The computation of phi(k) has to be done using the prime factorization of k, which is the only sensible way of doing it. If you need a refresher on that, wikipedia carries the formula.
If you now have to compute all prime divisors of every number between 1 and a large N, you'll die of old age before seeing any result, so I'd go the other way around, i.e. build all numbers below N, using their possible prime factors, i.e. all primes less than or equal to N.
Your problem is therefore going to be similar to computing all divisors of a number, only you do not know what is the maximum number of times you may find a certain prime in the factorization beforehand. Tweaking an iterator originally written by Tim Peters on the python list (something I've blogged about...) to include the totient function, a possible implementation in python that yields k, phi(k) pairs could be as follows:
def composites(factors, N) :
"""
Generates all number-totient pairs below N, unordered, from the prime factors.
"""
ps = sorted(set(factors))
omega = len(ps)
def rec_gen(n = 0) :
if n == omega :
yield (1,1)
else :
pows = [(1,1)]
val = ps[n]
while val <= N :
pows += [(val, val - pows[-1][0])]
val *= ps[n]
for q, phi_q in rec_gen(n + 1) :
for p, phi_p in pows :
if p * q > N :
break
else :
yield p * q, phi_p * phi_q
for p in rec_gen() :
yield p
If you need help on computing all prime factors below N, I've also blogged about it... Keep in mind, though that computing all primes below 1012 is in itself quite a remarkable feat...
Is this from Project Euler 245? I remember that question, and I have given up on it.
The fastest way I found for calculating totient was to multiply the prime factors (p-1) together, given that k has no repeated factors (which was never the case if I remember the problem correctly).
So for calculating factors, it would probably be best to use gmpy.next_prime or pyecm (elliptic curve factorization).
You could also sieve the prime factors as Jaime suggests. For numbers up to 1012, the maximum prime factor is below 1 million which should be reasonable.
If you memoize factorizations, it could speed up your phi function even more.
No one has found a faster way to calculate phi(k) (aka, Euler's totient function) than by first finding the prime factors of k. The world's best mathematicians have thrown many CPU cycles at the problem since 1977, since finding a faster way to solve this problem would create a weakness in the RSA public-key algorithm. (Both the public and the private key in RSA are calculated based on phi(n), where n is the product of two large primes.)
For these kind of problems I'm using an iterator that returns for each integer m < N the list of primes < sqrt(N) that divide m. To implement such an iterator I'm using an array A of length R where R > sqrt(N). At each point the array A contains list of primes that divide integers m .. m+R-1. I.e. A[m % R] contains primes dividing m. Each prime p is in exactly one list, i.e. in A[m % R] for the smallest integer in the range m .. m+R-1 that is divisible by p. When generating the next element of the iterator simply the list in A[m % R] is returned. Then the list of primes are removed from A[m % R] and each prime p is appended to A[(m+p) % R].
With a list of primes < sqrt(N) dividing m it is easy to find the factorization of m, since there is at most one prime larger than sqrt(N).
This method has complexity O(N log(log(N))) under the assumption that all operations including list operations take O(1). The memory requirement is O(sqrt(N)).
There is unfortunately, some constant overhead here, hence I was looking for a more elegant way to generate the values phi(n), but so for I've not been successful.
I think you can go the other way around. Instead of factorizing an integer k to get phi(k), you can attempt to generate all integers from 1 to N from primes and get phi(k) quickly. For example, if Pn is the maximum prime that is less than N, you can generate all integers less than N by
P1 i 1 * P2 i 2 * ... * Pn i n where each ij run from 0 to [log (N) / log (Pj)] ([] is the floor function).
That way, you can get phi(k) quickly wihout expensive prime factorization and still iterate through all k between 1 and N (not in order but I think you don't care about order).
This can be done with Memory complexity O(Sqrt(N)) and CPU complexity O(N * Log(Log(N))) with an optimized windowed Sieve of Eratosthenes, as implemented in the code example below.
As no language was specified and as I do not know Python, I have implemented it in VB.net, however I can convert it to C# if you need that.
Imports System.Math
Public Class TotientSerialCalculator
'Implements an extremely efficient Serial Totient(phi) calculator '
' This implements an optimized windowed Sieve of Eratosthenes. The'
' window size is set at Sqrt(N) both to optimize collecting and '
' applying all of the Primes below Sqrt(N), and to minimize '
' window-turning overhead. '
' '
' CPU complexity is O( N * Log(Log(N)) ), which is virtually linear.'
' '
' MEM Complexity is O( Sqrt(N) ). '
' '
' This is probalby the ideal combination, as any attempt to further '
'reduce memory will almost certainly result in disproportionate increases'
'in CPU complexity, and vice-versa. '
Structure NumberFactors
Dim UnFactored As Long 'the part of the number that still needs to be factored'
Dim Phi As Long 'the totient value progressively calculated'
' (equals total numbers less than N that are CoPrime to N)'
'MEM = 8 bytes each'
End Structure
Private ReportInterval As Long
Private PrevLast As Long 'the last value in the previous window'
Private FirstValue As Long 'the first value in this windows range'
Private WindowSize As Long
Private LastValue As Long 'the last value in this windows range'
Private NextFirst As Long 'the first value in the next window'
'Array that stores all of the NumberFactors in the current window.'
' this is the primary memory consumption for the class and it'
' is 16 * Sqrt(N) Bytes, which is O(Sqrt(N)).'
Public Numbers() As NumberFactors
' For N=10^12 (1 trilion), this will be 16MB, which should be bearable anywhere.'
'(note that the Primes() array is a secondary memory consumer'
' at O(pi(Sqrt(N)), which will be within 10x of O(Sqrt(N)))'
Public Event EmitTotientPair(ByVal k As Long, ByVal Phi As Long)
'===== The Routine To Call: ========================'
Public Sub EmitTotientPairsToN(ByVal N As Long)
'Routine to Emit Totient pairs {k, Phi(k)} for k = 1 to N'
' 2009-07-14, RBarryYoung, Created.'
Dim i As Long
Dim k As Long 'the current number being factored'
Dim p As Long 'the current prime factor'
'Establish the Window frame:'
' note: WindowSize is the critical value that controls both memory'
' usage and CPU consumption and must be SQRT(N) for it to work optimally.'
WindowSize = Ceiling(Sqrt(CDbl(N)))
ReDim Numbers(0 To WindowSize - 1)
'Initialize the first window:'
MapWindow(1)
Dim IsFirstWindow As Boolean = True
'adjust this to control how often results are show'
ReportInterval = N / 100
'Allocate the primes array to hold the primes list:'
' Only primes <= SQRT(N) are needed for factoring'
' PiMax(X) is a Max estimate of the number of primes <= X'
Dim Primes() As Long, PrimeIndex As Long, NextPrime As Long
'init the primes list and its pointers'
ReDim Primes(0 To PiMax(WindowSize) - 1)
Primes(0) = 2 '"prime" the primes list with the first prime'
NextPrime = 1
'Map (and Remap) the window with Sqrt(N) numbers, Sqrt(N) times to'
' sequentially map all of the numbers <= N.'
Do
'Sieve the primes across the current window'
PrimeIndex = 0
'note: cant use enumerator for the loop below because NextPrime'
' changes during the first window as new primes <= SQRT(N) are accumulated'
Do While PrimeIndex < NextPrime
'get the next prime in the list'
p = Primes(PrimeIndex)
'find the first multiple of (p) in the current window range'
k = PrevLast + p - (PrevLast Mod p)
Do
With Numbers(k - FirstValue)
.UnFactored = .UnFactored \ p 'always works the first time'
.Phi = .Phi * (p - 1) 'Phi = PRODUCT( (Pi-1)*Pi^(Ei-1) )'
'The loop test that follows is probably the central CPU overhead'
' I believe that it is O(N*Log(Log(N)), which is virtually O(N)'
' ( for instance at N = 10^12, Log(Log(N)) = 3.3 )'
Do While (.UnFactored Mod p) = 0
.UnFactored = .UnFactored \ p
.Phi = .Phi * p
Loop
End With
'skip ahead to the next multiple of p: '
'(this is what makes it so fast, never have to try prime factors that dont apply)'
k += p
'repeat until we step out of the current window:'
Loop While k < NextFirst
'if this is the first window, then scan ahead for primes'
If IsFirstWindow Then
For i = Primes(NextPrime - 1) + 1 To p ^ 2 - 1 'the range of possible new primes'
'Dont go beyond the first window'
If i >= WindowSize Then Exit For
If Numbers(i - FirstValue).UnFactored = i Then
'this is a prime less than SQRT(N), so add it to the list.'
Primes(NextPrime) = i
NextPrime += 1
End If
Next
End If
PrimeIndex += 1 'move to the next prime'
Loop
'Now Finish & Emit each one'
For k = FirstValue To LastValue
With Numbers(k - FirstValue)
'Primes larger than Sqrt(N) will not be finished: '
If .UnFactored > 1 Then
'Not done factoring, must be an large prime factor remaining: '
.Phi = .Phi * (.UnFactored - 1)
.UnFactored = 1
End If
'Emit the value pair: (k, Phi(k)) '
EmitPhi(k, .Phi)
End With
Next
're-Map to the next window '
IsFirstWindow = False
MapWindow(NextFirst)
Loop While FirstValue <= N
End Sub
Sub EmitPhi(ByVal k As Long, ByVal Phi As Long)
'just a placeholder for now, that raises an event to the display form'
' periodically for reporting purposes. Change this to do the actual'
' emitting.'
If (k Mod ReportInterval) = 0 Then
RaiseEvent EmitTotientPair(k, Phi)
End If
End Sub
Public Sub MapWindow(ByVal FirstVal As Long)
'Efficiently reset the window so that we do not have to re-allocate it.'
'init all of the boundary values'
FirstValue = FirstVal
PrevLast = FirstValue - 1
NextFirst = FirstValue + WindowSize
LastValue = NextFirst - 1
'Initialize the Numbers prime factor arrays'
Dim i As Long
For i = 0 To WindowSize - 1
With Numbers(i)
.UnFactored = i + FirstValue 'initially equal to the number itself'
.Phi = 1 'starts at mulplicative identity(1)'
End With
Next
End Sub
Function PiMax(ByVal x As Long) As Long
'estimate of pi(n) == {primes <= (n)} that is never less'
' than the actual number of primes. (from P. Dusart, 1999)'
Return (x / Log(x)) * (1.0 + 1.2762 / Log(x))
End Function
End Class
Note that at O(N * Log(Log(N))), this routine is factoring each number at O(Log(Log(N))) on average which is much, much faster than the fastest single N factorization algorithms sited by some of the replies here. In fact, at N = 10^12 it is 2400 times faster!
I have tested this routine on my 2Ghz Intel Core 2 laptop and it computes over 3,000,000 Phi() values per second. At this speed, it would take you about 4 days to compute 10^12 values. I have also tested it for correctness up to 100,000,000 without any errors. It is based in 64-bit integers, so anything up to 2^63 (10^21 or 1 Sextillion) should be accurate (though too slow for anyone).
I also have a Visual Studio WinForm (also VB.net) for running/testing it, that I can provide if you want it.
Let me know if you have any questions.