May-15-2018, 07:35 PM
The loop is not necessary as long as you want to operate along a whole array. Normally if you manage to vectorise an operation (~ remove the for loops) you gain a lot of speed and the code is more clear.
In the code I posted the loops are done by numpy "in the backstage" and they are much, much faster. In the first case numpy search for all the elements, so internally is doing:
In the second case, as I am asking for the max along axis 0, the loop would be equivalent to:
But sometimes you need to loop... for example, imagine that in your case you have 140 years of daily data so the 1st dimension has 51200 records and no day is missing (adjust this to your real case)
In that case if I want to collect the max every 365 days I can do something like:
In the code I posted the loops are done by numpy "in the backstage" and they are much, much faster. In the first case numpy search for all the elements, so internally is doing:
# this is the same as m = np.max(Quantity) m = Quantity[0, 0, 0] for i in range(Quantity.shape[0]): for j in range(Quantity.shape[1]): for k in range(Quantity.shape[2]): m = max(m, Quantity[i, j, k])But at C code level...
In the second case, as I am asking for the max along axis 0, the loop would be equivalent to:
# this is the same as m = np.max(Quantity, axis=0) m = np.empty(Quantity.shape[1:3]) for j in range(Quantity.shape[1]): for k in range(Quantity.shape[2]): tmp = Quantity[0, j, k] for i in range(Quantity.shape[0]): tmp = max(tmp, Quantity[i, j, k]) m[j, k] = tmpDo *NOT* use this loops, they are ultra inefficient, they just shows how numpy works.
But sometimes you need to loop... for example, imagine that in your case you have 140 years of daily data so the 1st dimension has 51200 records and no day is missing (adjust this to your real case)
In that case if I want to collect the max every 365 days I can do something like:
year = np.arange(Quantity.shape[0]) // 365 for y in range(max(year) + 1): v = np.max(Quantity[year == y, ...]) print(f"Max for year {y} = {v}")There you can see how using a masking array (an array of True/False with the same number of elements as the dimension) I can select blocks of 365 days. There is no easy way to do this without the loop, and if your input data is slightly complex (for example, you have other variable with the real days as datetime and you want to filter by natural year, not every 365 days) will be impossible (or will look like a hieroglyph, that might be worst)