Dec-06-2019, 06:02 AM
Hello. I have data in the following format. I am looking to groupby the location, and then find the maximum value for each compound where the corresponding flag is 'U'.
I have a half-way working solution, but I think it could be more readable, and hopefully there's a better way to get it into tabular format.
The following will print out the desired data, but I'm guessing there's a clearer method rather than nested for loops or the method of finding the values associated with 'U' flags.
Quote: location compound1 compound1_flag compound2 compound2_flag compound3 compound3_flag
A 0.36 NaN 0.4 U 0.2 U
B 5 U 2.2 NaN 8 U
B 9.8 J 3.2 U 3300 NaN
B 7.2 J 1800 NaN 280 NaN
C 9 J 3200 NaN 700 NaN
C 4.5 NaN 6.1 NaN 1000 NaN
C 7.8 NaN 0.16 U 0.47 J
D 0.89 NaN 0.30 U 6.1 NaN
D 14 NaN 0.16 U 0.59 NaN
E 2 U 5.6 NaN 200 NaN
I have a half-way working solution, but I think it could be more readable, and hopefully there's a better way to get it into tabular format.
The following will print out the desired data, but I'm guessing there's a clearer method rather than nested for loops or the method of finding the values associated with 'U' flags.
analytes = ['compound1', 'compound2', 'compound3'] flags = [''.join([x, '_flag']) for x in analytes] nondetect_dict = {k:v for (k,v) in zip(flags, analytes)} by_location = data.groupby('location') for key, group in by_location: for x in flags: print(key, nondetect_dict[x], by_location.get_group(key).loc[data[x].str.contains('U', regex=False, na=False), nondetect_dict[x]].max())I tried to convert this into a dictionary with a list comprehension for the second for loop. I thought I could turn into a dataframe, but this throws a syntax error:
max_nondetect = {} for key, group in by_location: max_nondetect[key] = [nondetect_dict[x], by_location.get_group(key).loc[data[x].str.contains('U', regex=False, na=False), nondetect_dict[x]].max() for x in flags] pd.DataFrame.from_dict(max_nondetect, orient='index')Any thoughts on a more efficient approach? Thank you in advance.