Posts: 2
Threads: 1
Joined: Nov 2019
I'm working on a forward pass for a neural network. I have written loop within loop within loops. I know there's a way to do this in numpy that is much faster and simpler.
def forward_p(x, w, b):
"""
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M
Outputs:
- cout: a numpy array of shape (N, M)
"""
N, H, W = x.shape
M, _, _ = w.shape
cout = np.zeros((N,M))
for ni in range(N):
for mi in range(M):
cout[ni,mi] = b[mi]
for d1 in range(H):
for d2 in range(W):
cout[ni,mi] += x[ni, d1, d2] * w[mi, d1, d2]
return cout
Posts: 300
Threads: 72
Joined: Apr 2019
Hi
We should be able to take advantages of vectorization (using kronecker product - see an example here), but it strongly depends on the size of (N,M,H,W); how many loops are we speaking about? million's or billion's ? the main limitation remains the RAM in my opinion
I've never worked on 4 imbricated loops, but it might be interesting to test it.
Paul
Posts: 360
Threads: 5
Joined: Jun 2019
This was my first solution. This will already give you a speed boost.
def forward_path_half_vectorized(x, w, b):
"""
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M
Outputs:
- cout: a numpy array of shape (N, M)
"""
N, _, _ = x.shape
M, _, _ = w.shape
cout = np.zeros((N, M))
for ni in range(N):
for mi in range(M):
cout[ni, mi] = np.sum(x[ni] * w[mi])
return cout + b But I thought there must be a better way and i found it looking through the numpy documentation.
https://docs.scipy.org/doc/numpy/referen...ordot.html
def forward_path_full_vectorized(x, w, b):
"""
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M
Outputs:
- cout: a numpy array of shape (N, M)
"""
return np.tensordot(x, w, axes=([1,2],[1,2])) + b The full vectorized version is even 90 times faster !
X = np.ones((100, 64, 64), dtype=np.float64) * 0.3
W = np.ones((200, 64, 64), dtype=np.float64) * 1.5
B = np.ones((200), dtype=np.float64) * 3.3
%timeit forward_path_half_vectorized(X, W, B)
-> 408 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit forward_path_full_vectorized(X, W, B)
-> 4.63 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Life can be easy knowing where to look. :-)
Posts: 4
Threads: 0
Joined: Nov 2019
(Nov-04-2019, 02:33 PM)ThomasL Wrote: This was my first solution. This will already give you a speed boost.
def forward_path_half_vectorized(x, w, b):
"""
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M
Outputs:
- cout: a numpy array of shape (N, M)
"""
N, _, _ = x.shape
M, _, _ = w.shape
cout = np.zeros((N, M))
for ni in range(N):
for mi in range(M):
cout[ni, mi] = np.sum(x[ni] * w[mi])
return cout + b But I thought there must be a better way and i found it looking through the numpy documentation.
https://docs.scipy.org/doc/numpy/referen...ordot.html
def forward_path_full_vectorized(x, w, b):
"""
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M
Outputs:
- cout: a numpy array of shape (N, M)
"""
return np.tensordot(x, w, axes=([1,2],[1,2])) + b The full vectorized version is even 90 times faster !
X = np.ones((100, 64, 64), dtype=np.float64) * 0.3
W = np.ones((200, 64, 64), dtype=np.float64) * 1.5
B = np.ones((200), dtype=np.float64) * 3.3
%timeit forward_path_half_vectorized(X, W, B)
-> 408 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit forward_path_full_vectorized(X, W, B)
-> 4.63 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Life can be easy knowing where to look. :-) What if the cout twas a float (single number) type?
Posts: 360
Threads: 5
Joined: Jun 2019
(Nov-04-2019, 02:57 PM)mrnapoli Wrote: What if the cout twas a float (single number) type? I don´t understand your question.
Please provide some more details on your thoughts.
Posts: 4
Threads: 0
Joined: Nov 2019
Nov-04-2019, 03:29 PM
(This post was last modified: Nov-04-2019, 03:30 PM by mrnapoli.)
In my case the inputs and outputs expected are as follow:
- b_l : A float(single number)
- cout: A float (single number)
Therefore I receive ValueError: setting an array element with a sequence when running the loop.
Posts: 360
Threads: 5
Joined: Jun 2019
Why would you use this function under these circumstances?
That makes by no means any sense.
Do you understand the docstring?
Quote: """
Inputs:
- x: A numpy array of images of shape (N, H, W)
- w: A numpy array of weights of shape (M, H, W)
- b: A numpy vector of biases of size M
Outputs:
- cout: a numpy array of shape (N, M)
"""
Posts: 4
Threads: 0
Joined: Nov 2019
"""
Inputs:
- x_i: A numpy array of images of shape (H, W)
- w_l: A numpy array of weights of shape (H, W)
- b_l: A float (single number)
Returns:
- out: A float (single number)
"""
N, H, W = x.shape
M, _, _ = w.shape
out = np.zeros((N,M))
Posts: 360
Threads: 5
Joined: Jun 2019
I suggest looking through the documentation:
e.g. numpy.dot()
e.g. numpy.matmul()
Posts: 4
Threads: 0
Joined: Nov 2019
i got it; i went back and read through the documentation. Thanks for the lead.
|