Your test for prime-ness works, but it is very inefficient. At least in the original post the code only tested if a number was divisible by a prime number using the knowledge that any number > 1 can be represented as a product of prime numbers. Since there are only 148933 prime numbers in the range 2..2000000, this is a huge reduction in the number of divides required to find all the primes. Since division is an expensive operation it is important to reduce the number of divisions.
But what if you could do this with zero divides? The Sieve of Eratosthenes tests if a number is a multiple of any other prime. If not, the number is prime. The algorithm generates the list of prime multiples on the fly. See a description of the algorithm here:
https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.
I wrote both algorithms as prime number generators. I added a counter to keep track of how many "costly" operations were performed by each. I also recorded how long it took to generate all the prime numbers in a given range. This is the code:
import time
def primes_m(n):
"""Generate prime numbers <= n. Use modulo to test for prime-ness"""
global count
count = 0
for p in range(2, n+1):
for t in range(2, p):
count += 1
if p % t == 0:
break
else:
yield p
def primes_m_optimize(n):
"""primes_m with some simple optimizations"""
global count
count = 0
if n > 1:
yield 2 # Only even prime
for p in range(3, n+1, 2): # Only test odd numbers
count += 1
for t in range(3, int(p**0.5)+1, 2): # Divide by odd numbers up to sqrt(p)
count += 1
if p % t == 0:
break
else:
yield p
def primes_e(n):
"""Generate prime numbers <= n using Sieve or Eratosthenes algorithm"""
global count
if n > 1:
yield 2 # Only even prime
not_prime = set()
for p in range(3, n+1, 2): # Only test odd numbers
count += 1
if p not in not_prime:
yield p
for np in range(3*p, n+1, 2*p): # add odd multiples of np to not_prime
not_prime.add(np)
count += 1
def timeit(label, func, n):
t = time.time()
x = sum(func(n))
t = time.time() - t
print(f"{label:15} sum = {x:>10} count = {count:>10} time = {t:3.5f}")
for n in (100, 1000, 10000, 100000):
print("\nn =", n)
timeit("Modulo", primes_m, n)
timeit("Optimized mod", primes_m_optimize, n)
timeit("Eratosthenes", primes_e, n)
Output:
n = 100
Modulo sum = 1060 count = 1133 time = 0.00000
Optimized Mod sum = 1060 count = 136 time = 0.00000
Eratosthenes sum = 1060 count = 185 time = 0.00000
n = 1000
Modulo sum = 76127 count = 78022 time = 0.00698
Optimized Mod sum = 76127 count = 2850 time = 0.00000
Eratosthenes sum = 76127 count = 3349 time = 0.00000
n = 100000
Modulo sum = 454396537 count = 455189150 time = 43.78123
Optimized Mod sum = 454396537 count = 1395441 time = 0.12371
Eratosthenes sum = 454396537 count = 1445440 time = 0.01496
For small numbers all three algorithms are very quick, but as the value of "n" grows, the number of "costly" operations grows, and grows very quickly for the unoptimized modulo algorithm. I did not time primes_m(2000000), but I did compute count = 1999995000002 and estimate the time = 17, 512 seconds or 4 hours 51 minutes, 52 seconds.
The sieve algorithm is the fastest. I thought this would be mostly due to the reducing the number of "costly" operations, but it is slightly less efficient than the optimized modulo algorithm at doing that. The sieve algorithm is 10 times faster than the optimized modulo because set operations are 10 times faster than division.
I just had to know the answer. What is the sum of prime numbers <= 2000000? I removed the counting code and this allowed a slight optimization to the sieve algorithm.
def primes_e(n):
"""Generate prime numbers <= n using Sieve or Eratosthenes algorithm"""
if n > 1:
yield 2 # Only even prime
not_prime = set()
for p in range(3, n+1, 2): # Only test odd numbers
if p not in not_prime:
yield p
not_prime.update(range(3*p, n+1, 2*p))
Output:
n = 2000000
Optimized Mod sum = 142913828922 time = 4.23710
Eratosthenes sum = 142913828922 time = 0.35306
I think this is a great lesson on optimization. The optimized modulo algorithm is over 4000 times faster solving the sum of primes under 2000000 than the unoptimized version. This is all due to doing less. The unoptimized version does 4000 times as much work as the optimized version to get the same result. The sieve version is only 10 times faster than the optimized version. It does about the same amount of work, but it does it in a faster way. Doing less is will always beat doing faster. Yet so often you see programmers trying to speed up their programs by using to do the same amount of work faster.