Python Forum
Trying to replace for loops with numpy
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Trying to replace for loops with numpy
Hello all,

I just recently started working on python, and I am currently working on a Simulation that reads the distances between a ball and the Floor through 11 time steps and finds the minimum. The problem is that the ball and the Floor have different number of Elements, so I am using three for lops to run the program, which takes a long time. The Code is computing the data of a Matrix with 11 rows (for each time step) and 3 collumns (x,y,z axis). I am using the qd database to run this example. Also, the Code runs around 20 Million calculations.

for t in range(tsteps):  # Starting a loop for all time steps

    for i in range(lp1):  # Starting the loop for all the elements in part 1

        el1 = elpart1[i].get_coords()  # This Returns the coordenates of the Elements as an np.ndarray (11 x 3)
        el1_id = elpart1[i].get_id()  # This helps with identifying the Elements 

        for j in range(lp2): # Starting the loop for all elements in part 2

            el2 = elpart2[j].get_coords()
            el2_id = elpart2[j].get_id()

            # Using an equation to find the distances between coordinates of each part:
            dist = ((el2[t, 0] - el1[t, 0])**2 + (el2[t, 1] - el1[t, 1])**2 + (el2[t, 2] - el1[t, 2])**2)**(1/2)

            # Adding the elements to the empty lists that were created before the loops started
            element_pair_list.append([el1_id, el2_id])
How could I replace the for Loops to make the Code run more efficiently? Any help is much appreciated!
Maybe can you provide an example in order to give you some advice(s)?

Maybe you can reduce the number of loop(s) (only onbe one per time step typically), and you 'll be able to speedup your calculation for sure. An example, using append is very costly (dynamic memory allocation = a new matrix is created for each append call, to add a new row) and you can easily avoid it either by creating a matrix, or by adding a column a matrix; numpy is implicitly vertorized and it's fast if it's used correctly.

Share a bit more and the community will help you
(I've demonstrated to an intern that it's possible to reduce the duration from more than an hour to 40 seconds Wink - 3 nested loops were used against no one finally; it depends on the topic).

here after a basic example from the numpy arrays/matrixes you get:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import time

n = 1_000_000;

# elt1 and elt2 are supposed to be 2 matrixes; the dimensions are (n,4)
# column 1 = number of element
#column 2/3/4 = X/Y/Z coordinates

## Matrixes build-up; the first column are the elements number
elt1 = np.random.random((n,4));
elt2 = np.random.random((n,4));

i = np.arange(1,n+1, dtype = np.float);
elt1[:,0] = i;
elt2[:,0] = i;

# now the dimension is supposed to be unknown
t0 = time.time();
nr = np.shape(elt1)[0]; # nr = number of rows
dist = np.sqrt( (elt2[:, 1] - elt1[:, 1])**2 + (elt2[:, 2] - elt1[:, 2])**2 + (elt2[:, 3] - elt1[:, 3])**2 )
t1 = time.time();
loc_minimum = np.where(dist == min(dist));
minimum = float(dist[loc_minimum]);
print("The minimum is {} and located at the row number {}".format(minimum,int(loc_minimum[0])));
print("Duration : {}".format(t1-t0));
Thank you for your reply! I was trying to symplify everything while explaining the problem but it lacks a some important details.

So as I mentioned in the beguining, the program computes the distances between a ball and the Floor through 11 time steps and finds the minimum distance. It does so by comparing the distances between all elements of each part (ball which is the first part or el1, and floor, the second part, el2), but the parts have a different number of elements each (2048 for the ball and 900 for the floor). These elements are basicaly the average of node coordenates in a certain area of the part. I'm using the loop "for i in range(lp1)" (part 1) so that it starts with the first element of that first part and then a second loop, that compares the first element of the first part to all elements of the second part, and them goes forward repeating the same steps for all elements of part 1. In that way, the code is comparing the distance between all the elements of each part and finding their minimum (which would be a case of "contact").

So, other than using an alternative to the append command, is there a way to replace the for loops that goes around each individual element at a time? Basicaly what im trying to do is to compare 2048 11x3 matrices to 900 also 11x3 matrices, without having to use the for loops to deal with the difference in the number of matrices.

Hope this helps clarifying what I am trying to say! I'm sorry for my vocabulary regarding python terms, I have just started using the program so I'm still not familiar with them.
Are you speaking about the minimum distance between 2 meshes (nodes and elements)?
(if so, you can probably reduce the number of calculations by looking if the element normal is well oriented for example; just an idea)

I guess the floor is not flat and that's why you've to calculate the "mutual" distance between each element of each part (ball vs floor); I would say that we may certainly improve your code, but more information's are needed: can you provide some extract of the ball elements and the floor ones (few number of each)?

Thanks again for your reply!

You are correct, there are nodes and elements for each part. The floor is actually flat in the start, but it deflects. Also, the code is supposed to be flexible so it will work for different scenarios, thats why I'm using all elements even for the floor. I went back to the code one more time and I realized that the coordenates actualy come in a 1x3 format, bu are repeated 11 times for each different time step:

(output from python)
[[20.15 -7.75 -5.69]]

[[-68.92 -73.01 -25.47]]

This is an example of the first coordenate of the first and second part for the first time step. Then I use the equation:

dist = ((el2[t, 0] - el1[t, 0]) ** 2 + (el2[t, 1] - el1[t, 1]) ** 2 + (el2[t, 2] - el1[t, 2]) ** 2) ** (1 / 2)
Which will compute the distance between this two coordinates
well I've been thinking that with few thousand of elements, some loop may be avoided, using Kronecker product (np.kron);

Just an example with 5 million of calculations (with no loop):
import numpy as np
import time
n1 = 1_000; # number of nodes in the ball
n2 = 5_000; # number of nodes in the floor

## Matrixes build-up; the first column are the elements number
elt1 = np.random.random((n1,4)); # ball
elt2 = np.random.random((n2,4)); # floor
i1 = np.arange(0,n1, dtype = np.float); # float necessarily
i2 = np.arange(n1,n1+n2, dtype = np.float); # just to have different numbers
i2b = np.arange(0,n2, dtype = np.float);

elt1[:,0] = i1; # the node number does not care
elt2[:,0] = i2;
# ball: the vect 1 is n1 blocs of n2 rows
# equivalent to: vect 1 = [0 0 0 ... 0 0 0 1 1 1 1 ... 1 1 1  and son on],
# floor; vect 2 is the blox of n2 nodes (floor) repeated n1 times
# vect 1 and vect 2 are vectors of indexes of elt1 and elt2 respectively

t0 = time.time();
j1 = np.ones(n2, dtype =; # warning: indexes must be integer
j2 = np.ones(n1, dtype =; # warning: indexes must be integer
i1 = i1.astype(int); # warning: indexes must be integer
i2b = i2b.astype(int); # warning: indexes must be integer

vect1 = np.kron(i1,j1);
vect2 = np.kron(j2,i2b);
NumberOfRows = np.shape(vect1)[0];
k = np.arange(0,NumberOfRows, dtype =;

distance_vector = np.sqrt( (elt2[vect2[k], 1] - elt1[vect1[k], 1])**2 + (elt2[vect2[k], 2] - elt1[vect1[k], 2])**2 + (elt2[vect2[k], 3] - elt1[vect1[k], 3])**2 )
minimum_distance = np.min(distance_vector);
t1 = time.time();

## check the distance between node1 of the ball to the node 1 of the floor
# versus distance_vector[0]
check = np.sqrt( (elt2[0, 1] - elt1[0, 1])**2 + (elt2[0, 2] - elt1[0, 2])**2 + (elt2[0, 3] - elt1[0, 3])**2 )
diff = check - distance_vector[0];
print("if diff equiv 0. then ok; here diff = {}".format(diff));
print("the {} calculations took {} second".format(NumberOfRows,t1-t0));
(seems to be good - it tooks 0.7 second on my old laptop with 6 Go of RAM)

Nonetheless the amont of memory becomes too high with several million of elements i.e. for a more general modelling, and using loops cannot be avoided in my mind; I'm wondering if Numba can help (I've never used it so far, but it seems promising if the necessary numpy capabilities have been implemented).

Your solution may be a mix between loops (1 per time step) and something close to the previous example

Hope it helps you in a certain way


I took the opportunity to build a more realistic model, and to validate it using a professional tool; I got the same results (see pictures).

[Image: 1564649081-test-distance.png]
[Image: 1564649073-check.png]
[Image: 1564649502-spyder.png]

Unfortunately I don't know how to share the files on the forum, especially the original meshes (send me a PM); I just can share the python file

hope it helps a bit


import numpy as np;
import os, time;
import h5py;

PATH = str(os.getcwd());
FileNameHDF5 = 'Distance_min.h5';

t0 = time.time();
with h5py.File(PATH + '/' + FileNameHDF5, 'r') as h5:    
    data = h5.get('Saddle/Saddle_nodes');
    Floor = np.array(data); del data;
    data = h5.get('Sphere/Sphere_nodes');
    Ball = np.array(data); del data;    

## get the number of rows/nodes per part
n1 = np.shape(Floor)[0];    # for the floor
n2 = np.shape(Ball)[0];     # for the ball

index1 = np.arange(0,n1, dtype =; 
index2 = np.arange(0,n2, dtype =;
# ball: the vect 1 is n1 blocs of n2 rows
# vect 1 = [0 0 0 ... 0 0 0 1 1 1 1 ... 1 1 1  and son on],
# floor; vect 2 is the blox of n2 nodes (floor) repeated n1 times
# vect 1 and vect 2 are vectors of indexes of floor and ball respectively

j1 = np.ones(n2, dtype =; # warning: indexes must be integer
j2 = np.ones(n1, dtype =; # warning: indexes must be integer

vect1 = np.kron(index1,j1);     # vect 1 returns the index of the Floor rows
vect2 = np.kron(j2,index2);     # vect 2 returns the same for the Ball rows

NumberOfRows = np.shape(vect1)[0];
k = np.arange(0,NumberOfRows, dtype =;      # k allows to 
distance_vector = np.sqrt( (Ball[vect2[k], 1] - Floor[vect1[k], 1])**2 + (Ball[vect2[k], 2] - Floor[vect1[k], 2])**2 + (Ball[vect2[k], 3] - Floor[vect1[k], 3])**2 )
minimum_distance = np.min(distance_vector);
del n1; del n2; del j1; del j2; del vect1; del vect2;
t1 = time.time();

print("The minimum distance has been found to {}".format(minimum_distance))
print("the {} calculations took {} second".format(NumberOfRows,t1-t0));
That is amazing, thank you so much! I think I'll have to take some time to study your method as this is pretty new to me, but it definetely helps!

If you want to test the power of vectorization, try the following basic example:

import numpy as np
import time

def square_function(x):
    return x**2

n = 10_000_000; # 10 million itérations

# case 1: using loops
y1 = np.zeros(n, dtype = np.float); # initialization

t0 = time.time();
for i in range(n):
    y1[i] = 0.1*i**2;

t1 = time.time();
duration1 = t1 - t0;
print("Duration using loops = {} seconds".format(duration1));

#case 2: using vectorization
y2 = np.zeros(n, dtype = np.float);      # initialization

t0 = time.time();
i = np.arange(n, dtype = np.float);     # vector of indexes is created
y2 = 0.1*i**2;
t1 = time.time();
duration2 = t1 - t0;
print("Duration using Vectorization = {} seconds".format(duration2));

## ratio calculation
print("Vectorization is {} times faster than loops use!!!".format(duration1 / duration2));
print("we check there's not difference: min(y2-y1) = {} and max(y2-y1) = {}".format(np.min(y2-y1),np.max(y2-y1)));
del y2; del y1;

# case 3: using a function works as well
y3 = np.zeros(n, dtype = np.float);      # initialization

t0 = time.time();
i = np.arange(n, dtype = np.float);     # vector of indexes is created
y3 = square_function(i);
t1 = time.time();
duration3 = t1 - t0;
print("Duration using Vectorization + function = {} seconds".format(duration3));
del y3;
Some basics on vectorization I'm currently writing down:
[Image: 1564738252-vectorization.png]

Concerning the 2 vectors resulting from Kronecker product, a more "visual" approach:
[Image: 1564738255-kronecker.png]


Possibly Related Threads…
Thread Author Replies Views Last Post
  Numpy returns "TypeError: unsupported operand type(s) for *: 'numpy.ufunc' and 'int'" kalle 2 1,836 Jul-19-2022, 06:31 AM
Last Post: paul18fr
  replace sets of values in an array without using loops paul18fr 7 1,198 Jun-20-2022, 08:15 PM
Last Post: paul18fr
  "erlarge" a numpy-matrix to numpy-array PhysChem 2 2,684 Apr-09-2019, 04:54 PM
Last Post: PhysChem

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020