Python Forum
Looping in a 3D matrix - desperate for assistance!
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Looping in a 3D matrix - desperate for assistance!
#11
The netCDF latest standard is 4.6.1, so you can use it.
Nevertheless as it is an hdf5 restricted you can also read it with pandas or h5py. Check also the page of the netCDF with the links to recommended interfaces and how to create a basic dataframe.
Reply
#12
Hi killerrex,

Sounds good, thanks!

One last question that came to mind, suppose that, adhering to what was done previously, I wanted to compute 5-day cumulative values (using np.sum) over groups of 5 consecutive days per year (for each of the 140 years for each grid cell). The idea would be to begin with day 0 to day 4 (so the first 5 days in year 0), and then move to the next group of 5 consecutive days, from day 1 to day 5, and then day 2 to day 6, and then day 3 to day 7, and then day 4 to day 8.....etc....all the way to the end of the 51,000 day period (140 years).

The end result would be the same as before, as the previously generated loop that contains the "np.max" function would derive the "annual" maximum 5 consecutive day values, so nothing to the previous code would need to be deleted, just expanded on a little.

Doing this would result in 361 groups of 5 consecutive days per year for 140 years. What would be the best way to group the days like that and determine 5-day maximum values for each group? What I did was create a new variable, called Quantity2, and defined it as:

Quantity2=np.zeros(Quantity.shape) #so there are all zeros in the output. 
I would have to create another loop in order to define the first set of 5 consecutive days (in year 0). One way to do it is use a for loop, such as:

for time in Quantity2
        Quantity2[time,...]
        fiveconsecutivedays=np.sum(Quantity[time-2:time+2,...], axis=0)
Quantity is the original 3D matrix. The "time-2" and "time+2" gives a range of days relative to a specified day. For example, at Day 2, the range would be Day 0 to Day 4. Using that approach, though, I cannot use Day 0 and Day 1, as well as the final two days in the period, since that would lead to a range that contains days that do not exist. For example, if you subtract 2 from Day 0 or Day 1, you will end up with a negative day. Likewise, you will end up with days that exceed 51100 for the final two days due to time+2. I'm just uncertain how to specify the conditions in the for loop and wanted to know of a way to use them for this case?

Does the above coding make sense to you?

Thanks, again.
Reply
#13
What you are looking for is a moving average. I use normally variations of this answer in stackoverflow
def moving_avg(a, n):
    # Notice that in the original here forces dtype=float...
    # I prefer to keep dtype of a, as not always work with float
    acc = np.cumsum(a)
    acc[n:] = acc[n:] - acc[:-n]
    return acc[n-1:] / n
Notice that this is fast but can be problematic when the array has many elements or outliers and the accumulated sum is too big and produces a degradation in the accuracy of the result.

when I have to deal with that type of arrays I prefer this version:
def ha_moving_avg(a, n):
    sz = len(a) - n + 1
    # acc = a[:sz].copy()
    acc = 0
    for k in range(n):
        acc += a[k:sz+k]
    return acc / n
Is slower, but if your data has big values or can contain outliers you can have disgusting surprises... consider the following:
def demo():

    n = 3
    # Just to print nice the arrays
    fmt = {'all': lambda x: '%.1g' % x}
    
    print("Nice inputs")
    # Let's try something nice...
    # m[k] = k so the average of 3 elements is:
    # (k + k+1 + k+2) / 3 = k + 1
    m = np.arange(10)
    ma = moving_avg(m, n)
    print("\tNormal:   " + np.array2string(ma, formatter=fmt))
    ma = ha_moving_avg(m, n)
    print("\tHigh Acc: " + np.array2string(ma, formatter=fmt))
    
    print("Broken Inputs")
    m = np.ones(10)
    m[4] = 1E20
    ma = moving_avg(m, n)
    print("\tNormal:   " + np.array2string(ma, formatter=fmt))
    ma = ha_moving_avg(m, n)
    print("\tHigh Acc: " + np.array2string(ma, formatter=fmt))


if __name__ == '__main__':
    demo()
Output:
Nice inputs Normal: [1 2 3 4 5 6 7 8] High Acc: [1 2 3 4 5 6 7 8] Broken Inputs Normal: [1 1 3e+19 3e+19 3e+19 0 0 0] High Acc: [1 1 3e+19 3e+19 3e+19 1 1 1]
See the surprise? Big Grin
Reply
#14
Hi killerrex,

Thank you so much for this response!

Would this also apply to calculating the "cumulative" values for 5 consecutive day periods? For example, Day 0+Day 1+Day 2+Day 3+Day 4. For my particular case, what I was envisioning would be to begin deriving the cumulative value over Day 0 to Day 4 (so, the first 5 days in year 0), and then move to the next group of 5 consecutive days, from Day 1 to Day 5 (i.e. Day 1+Day 2+Day 3+Day 4+Day 5), and then Day 2 to Day 6 (i.e. Day 2+Day 3+Day 4+Day 5+Day 6), and then Day 3 to Day 7, and then Day 4 to Day 8.....etc....all the way to the end of the 51,100 day period (140 years). In this case, would you use the np.sum function to derive cumulative values, and then have the previously generated code (see below) to generate the annual maximum cumulative values?

The output would be similar to before, as the previously generated loop that contains the "np.max" function (as shown in the following code) would derive the "annual" maximum cumulative 5 consecutive day values, so nothing to the previous code would need to be deleted, just expanded on. Based on the existing code (just modified a little, as shown below), but using the np.sum function in a separate loop:

# Quantity.shape = [N_time, N_lat, N_lon]
year = np.arange(Quantity.shape[0]) // 365
n_year = max(year) + 1
grid = np.empty((n_year, ) + Quantity.shape[1:3])
grid2 = np.empty((n_year, ) + Quantity2.shape[1:3])
for y in range(n_year):
    # grid [N_lat x N_lon]
    grid[y, ...] = np.max(Quantity[year == y, ...], axis=0)
    grid2[y, ...] = np.max(Quantity2[year == y, ...], axis=0)

And incorporating this before the previous for loop to define "Quantity2":

Quantity2=np.zeros(Quantity.shape)#contains all zeros
And this:

for time in Quantity2
        Quantity2[time,...]
        fiveconsecutivedays=np.sum(Quantity[time-2:time+2,...], axis=0)
"Quantity" is the original 3D matrix that was used previously. The "time-2" and "time+2" gives a range of days relative to a specified day. For example, at Day 2, the range would be Day 0 to Day 4 (because 2-2=0 and 2+2=4, so Day 0 to Day 4, relative to Day 2). Using that approach, though, I cannot use Day 0 and Day 1, as well as the final two days in the period, since that would lead to a range that contains days that do not exist. For instance, if you subtract 2 from Day 0 or Day 1, you will end up with a negative day (-2 and -1, respectively). Likewise, you will end up with days that exceed 51100 for the final two days due to time+2 (Day 51,101 and Day 51,102). Thus, some kind of condition would need to be used to tell the loop to ignore those particular days (the very first two days and the very last two days), but I'm just uncertain how to specify the conditions in the for loop and wanted to know of a way to use them for this case?

Ultimately, the goal is still to obtain maximum values at an annual time scale, as before (which the existing code will do already), but here I would try to specifically obtain the maximum "cumulative" values for every year for each 5 consecutive day periods - there would be about 50,000-51,000 5-consecutive day periods over the 140 years, but the existing np.max function would allow the output to return the maximum cumulative values for each year for each grid cell.

What do you think? Does that make sense?

I apologize for the inconvenience, but I really do appreciate your above response!
Reply
#15
Hi killerrex,

Expanding on the above post (post #14), would an if/else statement be relevant with the specified codes? If so, how could it be implemented?

Thank you, once again,
Reply
#16
Hi killerrex,

As per post #14, here is what I have so far:

Quantity=Q
Quantity2=np.zeros(Quantity.shape)
Year=np.arange(Quantity.shape[0])//365
n_year = max(Year)+1
onedaymax=np.empty((n_year, )+Quantity.shape[1:3])
fivedaymax=np.empty((n_year, )+Quantity2.shape[1:3])
        for time in range(Quantity.shape[0]):
                Prec5[time,...]
                fiveconsecutivedays=np.sum(Quantity[time-2:time+2,...], axis=0) 
                        if time==0, 1, 51099, 51,100
                                continue
                                        for y in range(n_year):
                                                onedaymax[y, ...]=np.max(Quantity[Year==y, ...], axis=0)
                                                fivedaymax[y, ...]=np.max(Quantity2[Year==y, ...], axis=0)
I'm not sure if that first for loop is the right approach (again, as per the goal specified in post #14). The "if time" portion is intended to omit the first two days and last two days of the period, but I'm unsure if that it specified correctly for my purposes. What would you modify there?

Many thanks, and I would appreciate any feedback!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need desperate help needHelpp 2 2,611 Aug-06-2017, 06:02 PM
Last Post: eyn2k

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020