numpy.dot() result different with classic computation for large-size arrays

geekgeek · (This post was last modified: Jan-25-2022, 05:35 AM by geekgeek.)

Basically I am testing the 1st sample code in URL https://www.geeksforgeeks.org/vectorization-in-python/

When I run the code as it is it is (array size is 100000), the result looks good - same as the output in the webpage.

But when I increase the array size to be 100000000 (100millions), I noticed the classic computation and numpy.dot result is different.

My environment:
OS: MacOS 11.6.2
Python: 3.8.9

The code:

# Dot product
import time
import numpy
import array

# 8 bytes size int
a = array.array('q')
N=100000000
for i in range(N):
        a.append(i);

b = array.array('q')
for i in range(N, 2*N):
        b.append(i)

# classic dot product of vectors implementation
tic = time.process_time()
dot = 0.0;

for i in range(len(a)):
        dot += a[i] * b[i]

toc = time.process_time()

print("dot_product = "+ str(dot));
print("Classic Computation time = " + str(1000*(toc - tic )) + "ms")

n_tic = time.process_time()
n_dot_product = numpy.dot(a, b)
n_toc = time.process_time()

print("\nn_dot_product = "+str(n_dot_product))
print("Vec Computation time = "+str(1000*(n_toc - n_tic ))+"ms")

The result is:

dot_product = 8.33333323333355e+23
Classic Computation time = 22549.694999999996ms

n_dot_product = 1659803504355747200
Vec Computation time = 131.33700000000204ms

I don't know why dot_product and n_dot_product are different. When I tried the code with N=100000, the dot_product and n_dot_product are same.
Thanks in advance.

**Gribouillis** · (This post was last modified: Jan-25-2022, 08:10 AM by Gribouillis.)

I guess it is because a 64 bits signed integer can store values between -2**63 and +2**63-1, which order of magnitude is 10**19, but here the result has a magnitude of 10**23. It follows that some bits are silently lost by overflow in the computation with 64 bits integers. Here is an example

>>> import array
>>> a = array.array('q')
>>> x = 2**63-1000
>>> x
9223372036854774808
>>> a.append(x)
>>> a
array('q', [9223372036854774808])
>>> import numpy as np
>>> np.dot(a, a)
1000000
>>> x * x
85070591730234597419099578148391436864
>>>

casevh · Jan-25-2022, 08:09 AM

The quick answer is that numpy.dot calculates the using the type in the construction of arrary.array. "q" represents a 64-bit signed integer which overflows at 2**63-1. This eventually overflows and that causes the different result.

The example uses coding practices that complicate a detailed explanation. I don't have time tonight for a better example but I'll respond again if I have time.

casevh

paul18fr · Jan-25-2022, 08:16 AM

I was not fast enough Wink

Try this for example (have a look to "print" structure)

import time
import numpy
import array
 
N=100000000

# # 8 bytes size int
# a = array.array('q')

# for i in range(N):
#         a.append(i);
 
# b = array.array('q')
# for i in range(N, 2*N):
#         b.append(i)

    
a = numpy.arange(N)
b = numpy.arange(N, 2*N)
# classic dot product of vectors implementation
tic = time.process_time()
dot = 0;
 
for i in range(N):
        dot += a[i] * b[i]
 
toc = time.process_time()
 
print(f"dot_product = {dot}")
print(f"Classic Computation time = {1000*(toc - tic )}ms")
 
n_tic = time.process_time()
n_dot_product = numpy.dot(a, b)     ## or n_dot_product = a @ b
n_toc = time.process_time()
 
print(f"\nn_dot_product = {(n_dot_product)}")
print(f"Vec Computation time = {1000*(n_toc - n_tic )} ms")

print(f"Difference on the dot product results = {numpy.abs(n_dot_product - dot)}")

geekgeek · Jan-25-2022, 09:23 PM

Thanks to Gribouillis, casevh, paul18fr.

Do we have a good way to raise overflow exception in python? As the code just silently use the overflown values, sometime we are not even aware of what is going on .

I didn't do much python programming, so not sure what is feasible or not.

Thanks.

**Gribouillis** · (This post was last modified: Jan-25-2022, 09:46 PM by Gribouillis.)

I think it is an issue in the numpy package https://github.com/numpy/numpy/issues/8987 . Ordinary Python doesn't fail silently. We'd need a numpy expert.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Parallel computation	DoveV	6	1,125	Feb-07-2025, 05:01 PM Last Post: DoveV
	Converted EXE file size is too large	Rajasekaran	0	2,811	Mar-30-2023, 11:50 AM Last Post: Rajasekaran
	problem adding two numpy arrays	djf123	2	3,043	Aug-09-2022, 08:31 PM Last Post: deanhystad
	how to join by stack multiple types in numpy arrays	caro	1	1,945	Jun-20-2022, 05:02 PM Last Post: deanhystad
	Element wise computation	divon	2	2,543	Jun-01-2022, 02:36 AM Last Post: divon
	How do I read in a Formula in Excel and convert it to do the computation in Python?	JaneTan	2	3,824	Jul-07-2021, 02:06 PM Last Post: Marbelous
	Two numpy arrays	Sandra2312	1	2,446	Jan-18-2021, 06:10 PM Last Post: paul18fr
	numpy in1d with two simple arrays	claw91	3	3,501	Sep-21-2020, 12:43 PM Last Post: scidam
	Type coercion with Numpy arrays	Mark17	2	3,642	Jul-24-2020, 02:04 AM Last Post: scidam
	filling and printing numpy arrays of str	pjfarley3	4	6,461	Jun-07-2020, 09:09 PM Last Post: pjfarley3

numpy.dot() result different with classic computation for large-size arrays

User Panel Messages

Announcements