please help making a loop faster

carla_highlander · Nov-02-2019, 02:39 PM

I'm working on a forward pass for a neural network. I have written loop within loop within loops. I know there's a way to do this in numpy that is much faster and simpler.

def forward_p(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M

    Outputs: 
    - cout: a numpy array of shape (N, M)
    """

    N, H, W = x.shape
    M, _, _ = w.shape
    cout = np.zeros((N,M))

    for ni in range(N):
        for mi in range(M):
                cout[ni,mi] = b[mi]
                for d1 in range(H):
                    for d2 in range(W):
                        cout[ni,mi] += x[ni, d1, d2] * w[mi, d1, d2] 
    return cout

paul18fr · Nov-04-2019, 11:23 AM

Hi

We should be able to take advantages of vectorization (using kronecker product - see an example here), but it strongly depends on the size of (N,M,H,W); how many loops are we speaking about? million's or billion's ? the main limitation remains the RAM in my opinion

I've never worked on 4 imbricated loops, but it might be interesting to test it.

Paul

ThomasL · Nov-04-2019, 02:33 PM

This was my first solution. This will already give you a speed boost.

def forward_path_half_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """

    N, _, _ = x.shape
    M, _, _ = w.shape
    cout = np.zeros((N, M))

    for ni in range(N):
        for mi in range(M):
            cout[ni, mi] = np.sum(x[ni] * w[mi])
    
    return cout + b

But I thought there must be a better way and i found it looking through the numpy documentation.
https://docs.scipy.org/doc/numpy/referen...ordot.html

def forward_path_full_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """
    
    return np.tensordot(x, w, axes=([1,2],[1,2])) + b

The full vectorized version is even 90 times faster !

X = np.ones((100, 64, 64), dtype=np.float64) * 0.3
W = np.ones((200, 64, 64), dtype=np.float64) * 1.5
B = np.ones((200), dtype=np.float64) * 3.3

%timeit forward_path_half_vectorized(X, W, B)
-> 408 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit forward_path_full_vectorized(X, W, B)
-> 4.63 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Life can be easy knowing where to look. :-)

mrnapoli · Nov-04-2019, 02:57 PM

(Nov-04-2019, 02:33 PM)ThomasL Wrote: This was my first solution. This will already give you a speed boost.

def forward_path_half_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """

    N, _, _ = x.shape
    M, _, _ = w.shape
    cout = np.zeros((N, M))

    for ni in range(N):
        for mi in range(M):
            cout[ni, mi] = np.sum(x[ni] * w[mi])
    
    return cout + b

But I thought there must be a better way and i found it looking through the numpy documentation.
https://docs.scipy.org/doc/numpy/referen...ordot.html

def forward_path_full_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """
    
    return np.tensordot(x, w, axes=([1,2],[1,2])) + b

The full vectorized version is even 90 times faster !

X = np.ones((100, 64, 64), dtype=np.float64) * 0.3
W = np.ones((200, 64, 64), dtype=np.float64) * 1.5
B = np.ones((200), dtype=np.float64) * 3.3

%timeit forward_path_half_vectorized(X, W, B)
-> 408 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit forward_path_full_vectorized(X, W, B)
-> 4.63 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Life can be easy knowing where to look. :-)

What if the cout twas a float (single number) type?

ThomasL · Nov-04-2019, 03:05 PM

(Nov-04-2019, 02:57 PM)mrnapoli Wrote: What if the cout twas a float (single number) type?

I don´t understand your question.
Please provide some more details on your thoughts.

mrnapoli · (This post was last modified: Nov-04-2019, 03:30 PM by mrnapoli.)

In my case the inputs and outputs expected are as follow:

- b_l : A float(single number)
- cout: A float (single number)

Therefore I receive ValueError: setting an array element with a sequence when running the loop.

ThomasL · Nov-04-2019, 06:34 PM

Why would you use this function under these circumstances?
That makes by no means any sense.
Do you understand the docstring?

Quote: """
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M

Outputs:
- cout: a numpy array of shape (N, M)
"""

mrnapoli · Nov-04-2019, 06:35 PM

"""
Inputs:
- x_i: A numpy array of images of shape (H, W)
- w_l: A numpy array of weights of shape (H, W)
- b_l: A float (single number)

Returns:
- out: A float (single number)
"""
N, H, W = x.shape
M, _, _ = w.shape
out = np.zeros((N,M))

ThomasL · Nov-04-2019, 06:58 PM

I suggest looking through the documentation:
e.g. numpy.dot()
e.g. numpy.matmul()

mrnapoli · Nov-04-2019, 07:00 PM

i got it; i went back and read through the documentation. Thanks for the lead.

please help making a loop faster

User Panel Messages

Announcements