Python Forum
Looping in a 3D matrix - desperate for assistance!
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looping in a 3D matrix - desperate for assistance!
#1
Hi there,

I would like to compute all annual maximum values in a 3D matrix using a for loop. To be more specific, the variable "Quantity" is a 3D matrix that is composed of latitude and longitude (both make up latitude and longitude grid cells) and time (expressed in days).

What I would like to do is compute all maximum values by looping all latitudes, all longitudes, and then loop all years in the period. This is done to obtain all maximum values over every grid cell (i.e. composed of latitude and longitude) for every year available (the default time units of "days" must also be converted to "years"), beginning with year 0.

My question is how should I construct a loop to do this? I realize that I would need to use the max() function, but I am uncertain where to use it in the loop.

This is what I have so far:


Quantity=Q

        for lat in range(np.size(Quantity,axis=1)):
                for lon in range(np.size(Quantity,axis=2)):
                        Quantity[:,lat,lon]
How do I account for time in the loop, and convert days to years, so that I may obtain maximum "annual" values for every grid cell?

Thanks, and any assistance would be greatly appreciated!!!!
Reply
#2
You can use the max function from numpy, choosing the axis where you want to perform the operation:
# Create a dummy array of 5 x 3 x 3 where 1st dimension is time and the others are space position.
>>> import numpy as np
>>> q = np.array(range(5*3*3))
>>> q.shape = (5, 3, 3)
# Get the maximum for all the elements
>>> np.max(q)
44
# Get the max for each lat, lon
>>> np.max(q, axis=0)
array([[36, 37, 38],
       [39, 40, 41],
       [42, 43, 44]])
# Get the max for only the times 2 and 3:
>>> np.max(q[2:4, ...])
35
Try using array selection operations (slices, masks) and playing with the axis parameter to obtain the values you want.
Reply
#3
Hi Killerex (and anyone else who would like to provide suggestions),

Thank you so much for your kind reply and suggestion.

I noticed in your code that you did not use a for loop as I did previously. Is the loop necessary? If so, how would you write the for loop to include time, latitude and longitude? The way I did it above was incorrect to start?

Also, the "Quantity" variable's first demension is time (in days, by default), followed by latitude, and then longitude, so would the creation of a dummy array be necessary? I would also like to convert days into years (there are 140 years, but I sm unsure how to incorporate the conversion into the code from the standards days). Any ideas?

Thanks, once again,
Reply
#4
The loop is not necessary as long as you want to operate along a whole array. Normally if you manage to vectorise an operation (~ remove the for loops) you gain a lot of speed and the code is more clear.
In the code I posted the loops are done by numpy "in the backstage" and they are much, much faster. In the first case numpy search for all the elements, so internally is doing:
# this is the same as m = np.max(Quantity)
m = Quantity[0, 0, 0]
for i in range(Quantity.shape[0]):
    for j in range(Quantity.shape[1]):
        for k in range(Quantity.shape[2]):
            m = max(m, Quantity[i, j, k])
But at C code level...
In the second case, as I am asking for the max along axis 0, the loop would be equivalent to:
# this is the same as m = np.max(Quantity, axis=0)
m = np.empty(Quantity.shape[1:3])
for j in range(Quantity.shape[1]):
    for k in range(Quantity.shape[2]):
        tmp = Quantity[0, j, k]
        for i in range(Quantity.shape[0]):
            tmp = max(tmp, Quantity[i, j, k])
        m[j, k] = tmp
Do *NOT* use this loops, they are ultra inefficient, they just shows how numpy works.

But sometimes you need to loop... for example, imagine that in your case you have 140 years of daily data so the 1st dimension has 51200 records and no day is missing (adjust this to your real case)
In that case if I want to collect the max every 365 days I can do something like:
year = np.arange(Quantity.shape[0]) // 365
for y in range(max(year) + 1):
    v = np.max(Quantity[year == y, ...])
    print(f"Max for year {y} = {v}")
There you can see how using a masking array (an array of True/False with the same number of elements as the dimension) I can select blocks of 365 days. There is no easy way to do this without the loop, and if your input data is slightly complex (for example, you have other variable with the real days as datetime and you want to filter by natural year, not every 365 days) will be impossible (or will look like a hieroglyph, that might be worst)
Reply
#5
Hi Killerrex,

Once again, a very big thank you for this response.

In my case, I am working with a 3D matrix, with the first dimension being time, the second being latitude (64 lines of latitude), and the third being longitude (128 degrees of longitude). The units of time are, thankfully, very standard and are not overly complex. The final code that you listed above is actually very relevant, since it defines how I would like to convert the default time units of days to years, and then using that to have Python compute the maximum values for every year for every grid cell (made up of latitude and longitude) in a loop.

Just a few additional questions surfaced:

1. If you wanted to use your last code to include every grid cell (made up of 64 degrees of latitude and 128 degrees of longitude), would that provide annual maximum values for each grid cell? If so, could you combine your last code with my existing code above:

Quantity=Q
 
        for lat in range(np.size(Quantity,axis=1)):
                for lon in range(np.size(Quantity,axis=2)):
                        Quantity[:,lat,lon]
With this:

year = np.arange(Quantity.shape[0]) // 365
for y in range(max(year) + 1):
    v = np.max(Quantity[year == y, ...])
    print(f"Max for year {y} = {v}")
For example, would that combination produce:

Grid cell #1 for year 0: Max value derived
Grid cell #2 for year 0: Max value derived
Grid cell #3 for year 0: Max value derived
....
Grid cell #1000 for year 0: Max value derived

Then, move on to year 1....

etc....all the way to year 139?

Does that make sense?

2. What does the "[0]" mean in the first line of your code?
3. What does the "+1" suggest? Does that allow the loop to move from one year to the next, all the way to year 139?
4. What does the "..." imply in line 3?
5. Could you elaborate a little more on what you have inserted in your print function, in line 4?

Many thanks, once again, and I apologize for the list of questions - I am fairly new to programming with loops!
Reply
#6
If you want to produce results for each element of the grid, let numpy do that for you. You shall not need to do an explicit loop in latitude or longitude except for printing the data or things really special.
For example, a modification of my code:
# Quantity.shape = [N_time, N_lat, N_lon]
year = np.arange(Quantity.shape[0]) // 365
# Remember that there is a year 0
n_year = max(year) + 1
grid = np.empty((n_year, ) + Quantity.shape[1:3])
for y in range(n_year):
    # grid [N_lat x N_lon]
    grid[y, ...] = np.max(Quantity[year == y, ...], axis=0)

# Now you can report as you prefer, for example per cell for all years:
for j in range(grid.shape[1]):
    for k in range(grid.shape[2]):
        print('({}, {}) = {}'.format(j, k, grid[:, j, k]))
The .shape of a numpy array is just a tuple with the dimensions so Quantity.shape[1] and np.size(Quantity,axis=1) are equal.
The +1 is to take into account that the range operation never reach the maximum value, so if the maximum number of years is 120, range(120) is going to go from 0 to 119.
If you prefer, you can think that if the biggest year is 120, that means I have 121 (from 0 to 120) years.
The use of ... in numpy is a handy element to represent "all the other dimensions", so in this case Quantity[k, ...] and Quantity[k, :, :] are equivalent. The advantage is that if tomorrow the Quantity array gains a 4th dimension (i.e. height) using the Ellipsis my code will continue working, while the version with 2 colons only work for arrays of dimension 3.
About the print function, I am using a really nice feature of python 3.6, the interpolated strings, it is completely equivalent to print("Max for year {} = {}".format(y, v))
Reply
#7
Thank you so, so much, killerrex - conceptually, things are becoming much clearer after reading your explanations. I will try as you suggested and let you know how things go!
Reply
#8
Hi killerrex,

I just wanted to inform you that I tried what you had kindly suggested above, and I believe that it worked successfully! Indeed, Python returned the maximum values for each grid cell for each of the 140 years!

One question for the above code, though, is why is the Quantity.shape expressed as "[1:3]"? Shouldn't that read [0:2] to take the first, second and third elements in the matrix "Quantity"? I tried [0:2], anyway, but received an error before it could finish. It was fine with [1:3], however, as you had stated. Similarly, what does the "empty" function do? Does it affect the original data?


Also, let's say that I wanted to store that enormous dataset of maximum values into a NetCDF file. The idea would be to work with this data in R, so storing it in a NetCDF file would be handy. Is it possible to do this with that amount of data, and preserving the latitude and longitude properties?

Thank you, once again!!!
Reply
#9
It is nice that is working, but remember that it is really limited, you will need to adapt it to your case.

The np.empty function creates a numpy array without initialising the memory. I do it in this way because in the loop I will write every element, so filling it with 0 will be just a waste of cycles.
In that line I am creating an array of N_years x N_lat x N_lon dimensions, and to do this I need to pass np.empty the tuple (n_year, n_lat, n_lon) Remember that Quantity has dimension (n_days, n_lat, n_lon), so I just need to use the 2nd and 3rd dimensions of that array... and that is what Quantity.shape[1:3] does (remember that in python slices, the upper value is not reached, so 1:4 means 1, 2, 3)

About the NetCDF, yes, for sure you can store this amount of data. I personally used HDF5 to store several Gb of data without any issue and there are nice bindings in python.
If you need to perform this kind of analysis you might like to see pandas, that is designed to work with these kind of massive data arrays.
Reply
#10
Hi killerrex,

Thank you for the clarifications!

For the NetCDF file creation in Python, I came across this but was not sure how relevant it was for my case:

http://www.ceda.ac.uk/static/media/uploa...python.pdf

They appear to be using "NetCDF4", so I am not sure if this is valid for my case. Should I follow this, or are you aware of a better/alternative method to store data (like the dataset that was just created) in NetCDF files? From what I see, I should begin with the Dataset() command, in accordance with that tutorial.

In my case, would I need to create dimensions, variables, etc. in order to successfully store my data in a NetCDF file?

Thanks, once again!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need desperate help needHelpp 2 2,582 Aug-06-2017, 06:02 PM
Last Post: eyn2k

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020