Excel - Python

bckdw3tci6 · Jun-24-2023, 04:19 PM

Greetings all,
I am one of those trying to learn Python on my own. I have pretty extensive excel function and I am trying to convert a lot of unstructured files into structured ones for downstream analysis. Python has been working well to do this, but I am having trouble producing the results of calculations. Specifically around generating mean values with standard deviations associated with cells containing a target string of letters.

Defining the Problem: I have a single compiled excel file with numerous sheets. One sheet contains multiple columns with data values and very long sample name designations. Within the sample name designations column, there is always a 5 character alphanumeric string within a >20 character sample name designation. There are 8 unique 5 character alphanumeric strings that I would like obtain the average values from experiments for each unique alphanumeric string. A mock example is below (you can see the unique 5 character string in each row[std01, std02, etc]):
Name Data Value 1
this_sample_name_is_really_long_and_inconvenient_std01_date_name_information 300
this_sample_name_is_really_long_and_inconvenient_std02_date_name_information 600
this_sample_name_is_really_long_and_inconvenient_std03_date_name_information 1200
this_sample_name_is_really_long_and_inconvenient_std04_date_name_information 2400
this_sample_name_is_really_long_and_inconvenient_std05_date_name_information 4800
this_sample_name_is_really_long_and_inconvenient_std06_date_name_information 9600
this_sample_name_is_really_long_and_inconvenient_std07_date_name_information 19200
this_sample_name_is_really_long_and_inconvenient_std08_date_name_information 38400
this_sample_name_is_really_long_and_inconvenient_std01_date_name_information_A 302
this_sample_name_is_really_long_and_inconvenient_std02_date_name_information_B 602
this_sample_name_is_really_long_and_inconvenient_std03_date_name_information_C 1202
this_sample_name_is_really_long_and_inconvenient_std04_date_name_information_D 2402
this_sample_name_is_really_long_and_inconvenient_std05_date_name_information_E 4802
this_sample_name_is_really_long_and_inconvenient_std06_date_name_information_F 9602
this_sample_name_is_really_long_and_inconvenient_std07_date_name_information_G 19202
this_sample_name_is_really_long_and_inconvenient_std08_date_name_information_H 38402
this_sample_name_is_really_long_and_inconvenient_std01_date_name_information_A_I 315
this_sample_name_is_really_long_and_inconvenient_std02_date_name_information_B_J 615
this_sample_name_is_really_long_and_inconvenient_std03_date_name_information_C_K 1215
this_sample_name_is_really_long_and_inconvenient_std04_date_name_information_D_L 2415
this_sample_name_is_really_long_and_inconvenient_std05_date_name_information_E_M 4815
this_sample_name_is_really_long_and_inconvenient_std06_date_name_information_F_N 9615
this_sample_name_is_really_long_and_inconvenient_std07_date_name_information_G_O 19215
this_sample_name_is_really_long_and_inconvenient_std08_date_name_information_H_P 38415

Objective: I would like to obtain the average and standard deviation of Data Value 1 from all std01 through std08 experiments. I would also like to deposit the results in a specific worksheet within the same excel file along with specifying the cell locations to generate a results table. The results table would show only the unique 5 character name for each sample along with the mean and stdev values on a row.

Approaches to This Point: I have tried several unsuccessful approaches so far. I have tried using target_letters (1-7 for each unique 5 character string; std01, stdo02, etc) along with filtered_df (also 1-7) and average_value (also 1-7) to obtain the data. This was followed by specifying where to report the results using result_cell (also 1-7) and sheet[result_cell].value = average_value for each std01-08. Using this approach is quite tedious and generates ~ 100 lines of code. Below are the relevant lines of it:

# Specify the text to filter and calculate the average for
target_letters = re.findall(r"[std01]", column_name)
target_letters1 = re.findall(r"[std02]", column_name)
target_letters2 = re.findall(r"[std03]", column_name)
target_letters3 = re.findall(r"[std04]", column_name)
target_letters4 = re.findall(r"[std05]", column_name)
target_letters5 = re.findall(r"[std06]", column_name)
target_letters6 = re.findall(r"[std07]", column_name)
target_letters7 = re.findall(r"[std08]", column_name)

# Filter the DataFrame based on the specific text
filtered_df = df[df[excel_file].contains(target_letters)]
filtered_df1 = df[df["Name"].str.contains(target_letters1)]
filtered_df2 = df[df["Name"].str.contains(target_letters2)]
filtered_df3 = df[df["Name"].str.contains(target_letters3)]
filtered_df4 = df[df["Name"].str.contains(target_letters4)]
filtered_df5 = df[df["Name"].str.contains(target_letters5)]
filtered_df6 = df[df["Name"].str.contains(target_letters6)]
filtered_df7 = df[df["Name"].str.contains(target_letters7)]

# Calculate the average of the filtered values
average_value = filtered_df["Area"].mean()
average_value1 = filtered_df1["Area"].mean()
average_value2 = filtered_df2["Area"].mean()
average_value3 = filtered_df3["Area"].mean()
average_value4 = filtered_df4["Area"].mean()
average_value5 = filtered_df5["Area"].mean()
average_value6 = filtered_df6["Area"].mean()
average_value7 = filtered_df7["Area"].mean()

# Open the workbook
wb = load_workbook(excel_file)

# Get the specific sheet in the workbook
sheet = wb["Standard"]

# Specify the cell where the count will be reported
result_cell = "H2" # Adjust the cell reference as needed
result_cell1 = "H3" # Adjust the cell reference as needed
result_cell2 = "H4" # Adjust the cell reference as needed
result_cell3 = "H5" # Adjust the cell reference as needed
result_cell4 = "H6" # Adjust the cell reference as needed
result_cell5 = "H7" # Adjust the cell reference as needed
result_cell6 = "H8" # Adjust the cell reference as needed
result_cell7 = "H9" # Adjust the cell reference as needed

# Write the word count to the specified cell
sheet[result_cell].value = average_value
sheet[result_cell1].value = average_value1
sheet[result_cell2].value = average_value2
sheet[result_cell3].value = average_value3
sheet[result_cell4].value = average_value4
sheet[result_cell5].value = average_value5
sheet[result_cell6].value = average_value6
sheet[result_cell7].value = average_value7

I am using PyCharm 2023.1.3. Any help is appreciate.

Excel - Python

User Panel Messages

Announcements