Python Forum

Basically I am testing the 1st sample code in URL https://www.geeksforgeeks.org/vectorization-in-python/

When I run the code as it is it is (array size is 100000), the result looks good - same as the output in the webpage.

But when I increase the array size to be 100000000 (100millions), I noticed the classic computation and numpy.dot result is different.

My environment:
OS: MacOS 11.6.2
Python: 3.8.9

The code:

# Dot product
import time
import numpy
import array

# 8 bytes size int
a = array.array('q')
N=100000000
for i in range(N):
        a.append(i);

b = array.array('q')
for i in range(N, 2*N):
        b.append(i)

# classic dot product of vectors implementation
tic = time.process_time()
dot = 0.0;

for i in range(len(a)):
        dot += a[i] * b[i]

toc = time.process_time()

print("dot_product = "+ str(dot));
print("Classic Computation time = " + str(1000*(toc - tic )) + "ms")

n_tic = time.process_time()
n_dot_product = numpy.dot(a, b)
n_toc = time.process_time()

print("\nn_dot_product = "+str(n_dot_product))
print("Vec Computation time = "+str(1000*(n_toc - n_tic ))+"ms")

The result is:

dot_product = 8.33333323333355e+23
Classic Computation time = 22549.694999999996ms

n_dot_product = 1659803504355747200
Vec Computation time = 131.33700000000204ms

I don't know why dot_product and n_dot_product are different. When I tried the code with N=100000, the dot_product and n_dot_product are same.
Thanks in advance.

I guess it is because a 64 bits signed integer can store values between -2**63 and +2**63-1, which order of magnitude is 10**19, but here the result has a magnitude of 10**23. It follows that some bits are silently lost by overflow in the computation with 64 bits integers. Here is an example

>>> import array
>>> a = array.array('q')
>>> x = 2**63-1000
>>> x
9223372036854774808
>>> a.append(x)
>>> a
array('q', [9223372036854774808])
>>> import numpy as np
>>> np.dot(a, a)
1000000
>>> x * x
85070591730234597419099578148391436864
>>>

The quick answer is that numpy.dot calculates the using the type in the construction of arrary.array. "q" represents a 64-bit signed integer which overflows at 2**63-1. This eventually overflows and that causes the different result.

The example uses coding practices that complicate a detailed explanation. I don't have time tonight for a better example but I'll respond again if I have time.

casevh

I was not fast enough Wink

Try this for example (have a look to "print" structure)

import time
import numpy
import array
 
N=100000000

# # 8 bytes size int
# a = array.array('q')

# for i in range(N):
#         a.append(i);
 
# b = array.array('q')
# for i in range(N, 2*N):
#         b.append(i)

    
a = numpy.arange(N)
b = numpy.arange(N, 2*N)
# classic dot product of vectors implementation
tic = time.process_time()
dot = 0;
 
for i in range(N):
        dot += a[i] * b[i]
 
toc = time.process_time()
 
print(f"dot_product = {dot}")
print(f"Classic Computation time = {1000*(toc - tic )}ms")
 
n_tic = time.process_time()
n_dot_product = numpy.dot(a, b)     ## or n_dot_product = a @ b
n_toc = time.process_time()
 
print(f"\nn_dot_product = {(n_dot_product)}")
print(f"Vec Computation time = {1000*(n_toc - n_tic )} ms")

print(f"Difference on the dot product results = {numpy.abs(n_dot_product - dot)}")

Thanks to Gribouillis, casevh, paul18fr.

Do we have a good way to raise overflow exception in python? As the code just silently use the overflown values, sometime we are not even aware of what is going on .

I didn't do much python programming, so not sure what is feasible or not.

Thanks.

I think it is an issue in the numpy package https://github.com/numpy/numpy/issues/8987 . Ordinary Python doesn't fail silently. We'd need a numpy expert.

geekgeek

Gribouillis

casevh

paul18fr

geekgeek

Gribouillis