Python Forum
numpy.dot() result different with classic computation for large-size arrays
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
numpy.dot() result different with classic computation for large-size arrays
#1
Basically I am testing the 1st sample code in URL https://www.geeksforgeeks.org/vectorization-in-python/

When I run the code as it is it is (array size is 100000), the result looks good - same as the output in the webpage.

But when I increase the array size to be 100000000 (100millions), I noticed the classic computation and numpy.dot result is different.

My environment:
OS: MacOS 11.6.2
Python: 3.8.9

The code:
# Dot product
import time
import numpy
import array

# 8 bytes size int
a = array.array('q')
N=100000000
for i in range(N):
        a.append(i);

b = array.array('q')
for i in range(N, 2*N):
        b.append(i)

# classic dot product of vectors implementation
tic = time.process_time()
dot = 0.0;

for i in range(len(a)):
        dot += a[i] * b[i]

toc = time.process_time()

print("dot_product = "+ str(dot));
print("Classic Computation time = " + str(1000*(toc - tic )) + "ms")

n_tic = time.process_time()
n_dot_product = numpy.dot(a, b)
n_toc = time.process_time()

print("\nn_dot_product = "+str(n_dot_product))
print("Vec Computation time = "+str(1000*(n_toc - n_tic ))+"ms")
The result is:

dot_product = 8.33333323333355e+23
Classic Computation time = 22549.694999999996ms

n_dot_product = 1659803504355747200
Vec Computation time = 131.33700000000204ms


I don't know why dot_product and n_dot_product are different. When I tried the code with N=100000, the dot_product and n_dot_product are same.
Thanks in advance.
Reply
#2
I guess it is because a 64 bits signed integer can store values between -2**63 and +2**63-1, which order of magnitude is 10**19, but here the result has a magnitude of 10**23. It follows that some bits are silently lost by overflow in the computation with 64 bits integers. Here is an example
>>> import array
>>> a = array.array('q')
>>> x = 2**63-1000
>>> x
9223372036854774808
>>> a.append(x)
>>> a
array('q', [9223372036854774808])
>>> import numpy as np
>>> np.dot(a, a)
1000000
>>> x * x
85070591730234597419099578148391436864
>>>
Reply
#3
The quick answer is that numpy.dot calculates the using the type in the construction of arrary.array. "q" represents a 64-bit signed integer which overflows at 2**63-1. This eventually overflows and that causes the different result.

The example uses coding practices that complicate a detailed explanation. I don't have time tonight for a better example but I'll respond again if I have time.

casevh
Reply
#4
I was not fast enough Wink

Try this for example (have a look to "print" structure)

import time
import numpy
import array
 
N=100000000

# # 8 bytes size int
# a = array.array('q')

# for i in range(N):
#         a.append(i);
 
# b = array.array('q')
# for i in range(N, 2*N):
#         b.append(i)

    
a = numpy.arange(N)
b = numpy.arange(N, 2*N)
# classic dot product of vectors implementation
tic = time.process_time()
dot = 0;
 
for i in range(N):
        dot += a[i] * b[i]
 
toc = time.process_time()
 
print(f"dot_product = {dot}")
print(f"Classic Computation time = {1000*(toc - tic )}ms")
 
n_tic = time.process_time()
n_dot_product = numpy.dot(a, b)     ## or n_dot_product = a @ b
n_toc = time.process_time()
 
print(f"\nn_dot_product = {(n_dot_product)}")
print(f"Vec Computation time = {1000*(n_toc - n_tic )} ms")

print(f"Difference on the dot product results = {numpy.abs(n_dot_product - dot)}")
Reply
#5
Thanks to Gribouillis, casevh, paul18fr.

Do we have a good way to raise overflow exception in python? As the code just silently use the overflown values, sometime we are not even aware of what is going on .

I didn't do much python programming, so not sure what is feasible or not.

Thanks.
Reply
#6
I think it is an issue in the numpy package https://github.com/numpy/numpy/issues/8987 . Ordinary Python doesn't fail silently. We'd need a numpy expert.
casevh likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Converted EXE file size is too large Rajasekaran 0 1,510 Mar-30-2023, 11:50 AM
Last Post: Rajasekaran
  problem adding two numpy arrays djf123 2 2,091 Aug-09-2022, 08:31 PM
Last Post: deanhystad
  how to join by stack multiple types in numpy arrays caro 1 1,141 Jun-20-2022, 05:02 PM
Last Post: deanhystad
  Element wise computation divon 2 1,574 Jun-01-2022, 02:36 AM
Last Post: divon
  How do I read in a Formula in Excel and convert it to do the computation in Python? JaneTan 2 2,638 Jul-07-2021, 02:06 PM
Last Post: Marbelous
  Two numpy arrays Sandra2312 1 1,802 Jan-18-2021, 06:10 PM
Last Post: paul18fr
  numpy in1d with two simple arrays claw91 3 2,584 Sep-21-2020, 12:43 PM
Last Post: scidam
  Type coercion with Numpy arrays Mark17 2 2,522 Jul-24-2020, 02:04 AM
Last Post: scidam
  filling and printing numpy arrays of str pjfarley3 4 3,292 Jun-07-2020, 09:09 PM
Last Post: pjfarley3
  size of set vs size of dict zweb 0 2,141 Oct-11-2019, 01:32 AM
Last Post: zweb

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020