Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 please help making a loop faster
#1
I'm working on a forward pass for a neural network. I have written loop within loop within loops. I know there's a way to do this in numpy that is much faster and simpler.

def forward_p(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M

    Outputs: 
    - cout: a numpy array of shape (N, M)
    """

    N, H, W = x.shape
    M, _, _ = w.shape
    cout = np.zeros((N,M))

    for ni in range(N):
        for mi in range(M):
                cout[ni,mi] = b[mi]
                for d1 in range(H):
                    for d2 in range(W):
                        cout[ni,mi] += x[ni, d1, d2] * w[mi, d1, d2] 
    return cout

Quote
#2
Hi

We should be able to take advantages of vectorization (using kronecker product - see an example here), but it strongly depends on the size of (N,M,H,W); how many loops are we speaking about? million's or billion's ? the main limitation remains the RAM in my opinion

I've never worked on 4 imbricated loops, but it might be interesting to test it.

Paul
Quote
#3
This was my first solution. This will already give you a speed boost.

def forward_path_half_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """

    N, _, _ = x.shape
    M, _, _ = w.shape
    cout = np.zeros((N, M))

    for ni in range(N):
        for mi in range(M):
            cout[ni, mi] = np.sum(x[ni] * w[mi])
    
    return cout + b
But I thought there must be a better way and i found it looking through the numpy documentation.
https://docs.scipy.org/doc/numpy/referen...ordot.html

def forward_path_full_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """
    
    return np.tensordot(x, w, axes=([1,2],[1,2])) + b
The full vectorized version is even 90 times faster !

X = np.ones((100, 64, 64), dtype=np.float64) * 0.3
W = np.ones((200, 64, 64), dtype=np.float64) * 1.5
B = np.ones((200), dtype=np.float64) * 3.3

%timeit forward_path_half_vectorized(X, W, B)
-> 408 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit forward_path_full_vectorized(X, W, B)
-> 4.63 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Life can be easy knowing where to look. :-)
Quote
#4
(Nov-04-2019, 02:33 PM)ThomasL Wrote: This was my first solution. This will already give you a speed boost.

def forward_path_half_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """

    N, _, _ = x.shape
    M, _, _ = w.shape
    cout = np.zeros((N, M))

    for ni in range(N):
        for mi in range(M):
            cout[ni, mi] = np.sum(x[ni] * w[mi])
    
    return cout + b
But I thought there must be a better way and i found it looking through the numpy documentation.
https://docs.scipy.org/doc/numpy/referen...ordot.html

def forward_path_full_vectorized(x, w, b):
    """
    Inputs:
    - x: A numpy array of images of shape (N, H, W)
    - w: A numpy array of weights of shape (M, H, W)
    - b: A numpy vector of biases of size M
 
    Outputs: 
    - cout: a numpy array of shape (N, M)
    """
    
    return np.tensordot(x, w, axes=([1,2],[1,2])) + b
The full vectorized version is even 90 times faster !

X = np.ones((100, 64, 64), dtype=np.float64) * 0.3
W = np.ones((200, 64, 64), dtype=np.float64) * 1.5
B = np.ones((200), dtype=np.float64) * 3.3

%timeit forward_path_half_vectorized(X, W, B)
-> 408 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit forward_path_full_vectorized(X, W, B)
-> 4.63 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Life can be easy knowing where to look. :-)
What if the cout twas a float (single number) type?
Quote
#5
(Nov-04-2019, 02:57 PM)mrnapoli Wrote: What if the cout twas a float (single number) type?
I don´t understand your question.
Please provide some more details on your thoughts.
Quote
#6
In my case the inputs and outputs expected are as follow:

- b_l : A float(single number)
- cout: A float (single number)

Therefore I receive ValueError: setting an array element with a sequence when running the loop.
Quote
#7
Why would you use this function under these circumstances?
That makes by no means any sense.
Do you understand the docstring?
Quote: """
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M

Outputs:
- cout: a numpy array of shape (N, M)
"""
Quote
#8
"""
Inputs:
- x_i: A numpy array of images of shape (H, W)
- w_l: A numpy array of weights of shape (H, W)
- b_l: A float (single number)

Returns:
- out: A float (single number)
"""
N, H, W = x.shape
M, _, _ = w.shape
out = np.zeros((N,M))
Quote
#9
I suggest looking through the documentation:
e.g. numpy.dot()
e.g. numpy.matmul()
Quote
#10
i got it; i went back and read through the documentation. Thanks for the lead.
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)