Jun-16-2019, 11:26 AM

I am trying to solve a problem, but I am new to really complicated coding. Below is the problem summary and code I tried. I will be very thankful if someone can provide some advice on this.

I have 2 variables 'Root zone' and 'Tree cover' both are geolocated (NetCDF) (which are basically grids with each grid having a specific value). The values in TC varies from 0 to 100. Each grid size is 0.25 degrees (might be helpful in understanding the distance).

My problem is "I want to calculate the distance of each TC value ranging between 70-100 and 30-70 from the points where nearest TC ranges between 0-30 (less than 30)." What I want to do is create a 2-dimensional scatter plot with X-axis denoting the 'distance in km of 70-100 TC (and 30-70 TC) from 0-30 values', Y-axis denoting 'Rootzone' of those 70-100 TC points (and 30-70 TC)'.

The code I tried:

Secondly, is it possible to define a 20 km buffer region of 'low = (tc[i,:,:] <= 30)'. What I mean is that the 'low' is defined only when a cluster of Tree cover values are below 30 and not by an individual pixel. I am completely oblivious how to start with the second part.

Here are the NetCDF files which might be helpful. Link to NetCDF's

The graph I am looking for is something like this.

I have 2 variables 'Root zone' and 'Tree cover' both are geolocated (NetCDF) (which are basically grids with each grid having a specific value). The values in TC varies from 0 to 100. Each grid size is 0.25 degrees (might be helpful in understanding the distance).

My problem is "I want to calculate the distance of each TC value ranging between 70-100 and 30-70 from the points where nearest TC ranges between 0-30 (less than 30)." What I want to do is create a 2-dimensional scatter plot with X-axis denoting the 'distance in km of 70-100 TC (and 30-70 TC) from 0-30 values', Y-axis denoting 'Rootzone' of those 70-100 TC points (and 30-70 TC)'.

The code I tried:

#I read the files using xarray deficit_annual = xr.open_dataset('Rootzone_CHIRPS_era5_2000-2015_annual_SA_masked.nc') tc = xr.open_dataset('Treecover_MODIS_2000-2015_annual_SA_masked.nc') fig, ax = plt.subplots(figsize = (8,8)) ## year I am interested in year = 2000 i = year - 2000 # Select the indices of the low- and high-valued points # This will results in warnings here because of NaNs; # the NaNs should be filtered out in the indices, since they will # compare to False in all the comparisons, and thus not be # indexed by 'low' and 'high' low = (tc[i,:,:] <= 30) # Savanna moderate = (tc[i,:,:] > 30) & (tc[i,:,:] < 70) #Transitional forest high = (tc[i,:,:] >= 70) #Forest # Get the coordinates for the low- and high-valued points, # combine and transpose them to be in the correct format y, x = np.where(low) low_coords = np.array([x, y]).T y, x = np.where(high) high_coords = np.array([x, y]).T y, x = np.where(moderate) moderate_coords = np.array([x, y]).T # We now calculate the distances between *all* low-valued points, and *all* high-valued points. # This calculation scales as O^2, as does the memory cost (of the output), # so be wary when using it with large input sizes. from scipy.spatial.distance import cdist, pdist distances = cdist(low_coords, moderate_coords, 'euclidean') # Now find the minimum distance along the axis of the high-valued coords, # which here is the second axis. # Since we also want to find values corresponding to those minimum distances, # we should use the `argmin` function instead of a normal `min` function. indices = distances.argmin(axis=1) mindistances = distances[np.arange(distances.shape[0]), indices] minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices] plt.scatter(mindistances*25, minrzs, s = 60, alpha = 0.5, color = 'goldenrod', label = 'Trasitional Forest') distances = cdist(low_coords, high_coords, 'euclidean') # Now find the minimum distance along the axis of the high-valued coords, # which here is the second axis. # Since we also want to find values corresponding to those minimum distances, # we should use the `argmin` function instead of a normal `min` function. indices = distances.argmin(axis=1) mindistances = distances[np.arange(distances.shape[0]), indices] minrzs = np.array(deficit_annual[i,:,:]).flatten()[indices] plt.scatter(mindistances*25, minrzs, s = 60, alpha = 1, color = 'green', label = 'Forest') plt.xlabel('Distance from Savanna (km)', fontsize = '14') plt.xticks(fontsize = '14') plt.yticks(fontsize = '14') plt.ylabel('Rootzone storage capacity (mm/year)', fontsize = '14') plt.legend(fontsize = '14') #plt.ylim((-10, 1100)) #plt.xlim((0, 30))I seem to have some problems with the code and I am not able to troubleshoot it (as it is working now, but doesn't seem to work when I increase the 'high = (tc[i,:,:] >= 70 ` to 80 for year 2000). This makes me wonder if the code is correct or not.

Secondly, is it possible to define a 20 km buffer region of 'low = (tc[i,:,:] <= 30)'. What I mean is that the 'low' is defined only when a cluster of Tree cover values are below 30 and not by an individual pixel. I am completely oblivious how to start with the second part.

Here are the NetCDF files which might be helpful. Link to NetCDF's

The graph I am looking for is something like this.