May-19-2023, 07:24 AM
Hello everyone,
I have a bunch of csv-files (one per corpus) containing annotated data in multiple columns. I annotated aspects and sentiment. The only relevant columns are "Aspect category", "Polarity lore" and "Irony lore". The first column contain the category labels which have been assigned to the aspects, such as "Text". There is a set number of categories, but of course not necessarily all of them have been used in a certain corpus. The second column assign a sentiment label to the aspect, namely "Positive" or "Negative" (sometimes there is failed label). The last column tells you wether the sentiment is meant ironically ("true" or "false"). I wish to know how often each aspect category is mentioned in combination with either positive or negative sentiment and how many of these positive or negative sentiments were meant ironically. Calculating all this for each file via a filter would be very time consuming however. A friend recommended using a script, but my coding knowledge is extremely basic. I know just enough to kind of know what it's about when I look at a script, but not enough to write one myself. So I tried creating one via chatGPT as I'd heard you could use it for coding help. The script did not work (no true surprise there). Neverthelesss, I will include it below and add some sample data. I would be very grateful if someone could help, but understand if this is not a priority in comparison to other posts.
This is the script:
I have a bunch of csv-files (one per corpus) containing annotated data in multiple columns. I annotated aspects and sentiment. The only relevant columns are "Aspect category", "Polarity lore" and "Irony lore". The first column contain the category labels which have been assigned to the aspects, such as "Text". There is a set number of categories, but of course not necessarily all of them have been used in a certain corpus. The second column assign a sentiment label to the aspect, namely "Positive" or "Negative" (sometimes there is failed label). The last column tells you wether the sentiment is meant ironically ("true" or "false"). I wish to know how often each aspect category is mentioned in combination with either positive or negative sentiment and how many of these positive or negative sentiments were meant ironically. Calculating all this for each file via a filter would be very time consuming however. A friend recommended using a script, but my coding knowledge is extremely basic. I know just enough to kind of know what it's about when I look at a script, but not enough to write one myself. So I tried creating one via chatGPT as I'd heard you could use it for coding help. The script did not work (no true surprise there). Neverthelesss, I will include it below and add some sample data. I would be very grateful if someone could help, but understand if this is not a priority in comparison to other posts.
This is the script:
import os import pandas as pd # Set the path to the directory containing the CSV files csv_dir = 'path/to/csv/files' # Iterate through each CSV file in the directory for file_name in os.listdir(csv_dir): if file_name.endswith('.csv'): # Read the CSV file file_path = os.path.join(csv_dir, file_name) df = pd.read_csv(file_path) # Calculate frequency counts for Aspect category aspect_counts = df['Aspect category'].value_counts().reset_index() aspect_counts.columns = ['Aspect category', 'Aspect Frequency'] # Calculate frequency counts for Aspect category and Polarity lore combination polarity_counts = df.groupby(['Aspect category', 'Polarity lore']).size().unstack().reset_index() polarity_counts.columns = ['Aspect category', 'Negative', 'Positive'] polarity_counts.fillna(0, inplace=True) polarity_counts['Negative'] = polarity_counts['Negative'].astype(int) polarity_counts['Positive'] = polarity_counts['Positive'].astype(int) # Create an "NA" column for any values other than "Positive" or "Negative" in Polarity lore polarity_counts['NA'] = df['Polarity lore'].apply(lambda x: 1 if x not in ['Positive', 'Negative'] else 0) # Save the calculations to a new CSV file calculation_file_path = os.path.join(csv_dir, f'calculation_{file_name}') with pd.ExcelWriter(calculation_file_path, engine='xlsxwriter') as writer: aspect_counts.to_excel(writer, sheet_name='Aspect Frequency', index=False) polarity_counts.to_excel(writer, sheet_name='Aspect and Polarity', index=False)