script to calculate data in csv-files

ledgreve · May-19-2023, 07:24 AM

Hello everyone,

I have a bunch of csv-files (one per corpus) containing annotated data in multiple columns. I annotated aspects and sentiment. The only relevant columns are "Aspect category", "Polarity lore" and "Irony lore". The first column contain the category labels which have been assigned to the aspects, such as "Text". There is a set number of categories, but of course not necessarily all of them have been used in a certain corpus. The second column assign a sentiment label to the aspect, namely "Positive" or "Negative" (sometimes there is failed label). The last column tells you wether the sentiment is meant ironically ("true" or "false"). I wish to know how often each aspect category is mentioned in combination with either positive or negative sentiment and how many of these positive or negative sentiments were meant ironically. Calculating all this for each file via a filter would be very time consuming however. A friend recommended using a script, but my coding knowledge is extremely basic. I know just enough to kind of know what it's about when I look at a script, but not enough to write one myself. So I tried creating one via chatGPT as I'd heard you could use it for coding help. The script did not work (no true surprise there). Neverthelesss, I will include it below and add some sample data. I would be very grateful if someone could help, but understand if this is not a priority in comparison to other posts.

This is the script:

import os
import pandas as pd

# Set the path to the directory containing the CSV files
csv_dir = 'path/to/csv/files'

# Iterate through each CSV file in the directory
for file_name in os.listdir(csv_dir):
    if file_name.endswith('.csv'):
        # Read the CSV file
        file_path = os.path.join(csv_dir, file_name)
        df = pd.read_csv(file_path)
        
        # Calculate frequency counts for Aspect category
        aspect_counts = df['Aspect category'].value_counts().reset_index()
        aspect_counts.columns = ['Aspect category', 'Aspect Frequency']
        
        # Calculate frequency counts for Aspect category and Polarity lore combination
        polarity_counts = df.groupby(['Aspect category', 'Polarity lore']).size().unstack().reset_index()
        polarity_counts.columns = ['Aspect category', 'Negative', 'Positive']
        polarity_counts.fillna(0, inplace=True)
        polarity_counts['Negative'] = polarity_counts['Negative'].astype(int)
        polarity_counts['Positive'] = polarity_counts['Positive'].astype(int)
        
        # Create an "NA" column for any values other than "Positive" or "Negative" in Polarity lore
        polarity_counts['NA'] = df['Polarity lore'].apply(lambda x: 1 if x not in ['Positive', 'Negative'] else 0)
        
        # Save the calculations to a new CSV file
        calculation_file_path = os.path.join(csv_dir, f'calculation_{file_name}')
        with pd.ExcelWriter(calculation_file_path, engine='xlsxwriter') as writer:
            aspect_counts.to_excel(writer, sheet_name='Aspect Frequency', index=False)
            polarity_counts.to_excel(writer, sheet_name='Aspect and Polarity', index=False)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to generating multiple json files using python script	dzgn989	4	2,332	May-10-2024, 03:09 PM Last Post: deanhystad
	Is it possible to extract 1 or 2 bits of data from MS project files?	cubangt	8	3,726	Feb-16-2024, 12:02 AM Last Post: deanhystad
	Need help for a python script to extract information from a list of files	lephunghien	6	2,578	Jun-12-2023, 05:40 PM Last Post: snippsat
	SQL Alchemy help to extract sql data into csv files	mg24	1	3,220	Sep-30-2022, 04:43 PM Last Post: Larz60+
	Apply textual data cleaning to several CSV files	ErcoleL99	0	1,384	Jul-09-2022, 03:01 PM Last Post: ErcoleL99
	How can I add certain elements in this 2d data structure and calculate a mean	TheOddCircle	3	2,400	May-27-2022, 09:09 AM Last Post: paul18fr
	calculate data using 1 byte checksum	korenron	2	4,327	Nov-23-2021, 07:17 AM Last Post: korenron
	Including data files in a package	ChrisOfBristol	4	4,612	Oct-27-2021, 04:14 PM Last Post: ChrisOfBristol
	Plotting sum of data files using simple code	Laplace12	3	4,374	Jun-16-2021, 02:06 PM Last Post: BashBedlam
	Running script on multiple files	Afrodizzyjack	1	3,232	May-14-2021, 10:49 PM Last Post: Yoriz

script to calculate data in csv-files

User Panel Messages

Announcements