Python Forum

Full Version: Make dual vector dot-product more efficient
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a dot product calculating weighted means, and is applied to two columns in a list with the measured values in column 5 and 6:

# calculate weighted values in sequence
for i in range(len(temperatures)-len(weights)):
  temperatures[i].append(sum([weights[j]*temperatures[i+j][5] for j in range(len(weights))]))
  temperatures[i].append(sum([weights[j]*temperatures[i+j][6] for j in range(len(weights))]))
The calculation performs a running dot-product, appending both values to the list temperatures. The list of temperature samples is far larger than the list of weights, hence the correction of subtracting len(weights) at the end of the main loop.
This traverses the list of weights twice, which is inefficient and degrades performance. How could this be done in a more pythonic way?

I also have concerns about the main loop because of the correction with len(weights). Would this be considered more pythonic?:

# calculate weighted values in sequence
for i in range(len(temperatures)):
   try:
      # insert weighted calculation here
   except:
      # do nothing, because array out of bounds
Thank you for your response, but perhaps I should have mentioned that this is a newbie question. I am looking for a solution that helps me understand the syntax and idiom of Python better, and without having to resort to other libraries.
It seems clear to me that you are computing discrete convolutions of numerical arrays. The best way to do this is probably to use numpy. Consider the following code comparing a computation with a list to a computation with a numpy array
import numpy as np

def main():
    T = [3., 1., 5., 7., 9., 2., -1.]
    W = [0.2, -0.1, 0.3]
    
    R = [sum(W[j] * T[i + j] for j in range(len(W)))
         for i in range(len(T) - len(W)+ 1)]
    print(R)
    
    t = np.array(T)
    w = np.array(W)
    r = np.convolve(t, w[::-1], 'valid')
    print(r)


if __name__ == '__main__':
    main()
Output:
[2.0, 1.8, 2.9999999999999996, 1.1, 1.3] [ 2. 1.8 3. 1.1 1.3]
Concerning execution time, I compared with the timeit module and for an array T of size 21 and W of size 6, the numpy version is already more than 10 x faster than the list version. This difference will grow much more with larger arrays.

Let us not forget to conclude with this famous quote from The Art of Computer Programming

Donald Knuth Wrote:“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”