Python Forum
script to calculate data in csv-files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
script to calculate data in csv-files
#1
Hello everyone,

I have a bunch of csv-files (one per corpus) containing annotated data in multiple columns. I annotated aspects and sentiment. The only relevant columns are "Aspect category", "Polarity lore" and "Irony lore". The first column contain the category labels which have been assigned to the aspects, such as "Text". There is a set number of categories, but of course not necessarily all of them have been used in a certain corpus. The second column assign a sentiment label to the aspect, namely "Positive" or "Negative" (sometimes there is failed label). The last column tells you wether the sentiment is meant ironically ("true" or "false"). I wish to know how often each aspect category is mentioned in combination with either positive or negative sentiment and how many of these positive or negative sentiments were meant ironically. Calculating all this for each file via a filter would be very time consuming however. A friend recommended using a script, but my coding knowledge is extremely basic. I know just enough to kind of know what it's about when I look at a script, but not enough to write one myself. So I tried creating one via chatGPT as I'd heard you could use it for coding help. The script did not work (no true surprise there). Neverthelesss, I will include it below and add some sample data. I would be very grateful if someone could help, but understand if this is not a priority in comparison to other posts.

This is the script:

import os
import pandas as pd

# Set the path to the directory containing the CSV files
csv_dir = 'path/to/csv/files'

# Iterate through each CSV file in the directory
for file_name in os.listdir(csv_dir):
    if file_name.endswith('.csv'):
        # Read the CSV file
        file_path = os.path.join(csv_dir, file_name)
        df = pd.read_csv(file_path)
        
        # Calculate frequency counts for Aspect category
        aspect_counts = df['Aspect category'].value_counts().reset_index()
        aspect_counts.columns = ['Aspect category', 'Aspect Frequency']
        
        # Calculate frequency counts for Aspect category and Polarity lore combination
        polarity_counts = df.groupby(['Aspect category', 'Polarity lore']).size().unstack().reset_index()
        polarity_counts.columns = ['Aspect category', 'Negative', 'Positive']
        polarity_counts.fillna(0, inplace=True)
        polarity_counts['Negative'] = polarity_counts['Negative'].astype(int)
        polarity_counts['Positive'] = polarity_counts['Positive'].astype(int)
        
        # Create an "NA" column for any values other than "Positive" or "Negative" in Polarity lore
        polarity_counts['NA'] = df['Polarity lore'].apply(lambda x: 1 if x not in ['Positive', 'Negative'] else 0)
        
        # Save the calculations to a new CSV file
        calculation_file_path = os.path.join(csv_dir, f'calculation_{file_name}')
        with pd.ExcelWriter(calculation_file_path, engine='xlsxwriter') as writer:
            aspect_counts.to_excel(writer, sheet_name='Aspect Frequency', index=False)
            polarity_counts.to_excel(writer, sheet_name='Aspect and Polarity', index=False)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 1,070 Feb-16-2024, 12:02 AM
Last Post: deanhystad
Question Need help for a python script to extract information from a list of files lephunghien 6 1,109 Jun-12-2023, 05:40 PM
Last Post: snippsat
  SQL Alchemy help to extract sql data into csv files mg24 1 1,792 Sep-30-2022, 04:43 PM
Last Post: Larz60+
  Apply textual data cleaning to several CSV files ErcoleL99 0 845 Jul-09-2022, 03:01 PM
Last Post: ErcoleL99
  How can I add certain elements in this 2d data structure and calculate a mean TheOddCircle 3 1,563 May-27-2022, 09:09 AM
Last Post: paul18fr
  calculate data using 1 byte checksum korenron 2 2,966 Nov-23-2021, 07:17 AM
Last Post: korenron
  Including data files in a package ChrisOfBristol 4 2,559 Oct-27-2021, 04:14 PM
Last Post: ChrisOfBristol
  Plotting sum of data files using simple code Laplace12 3 3,062 Jun-16-2021, 02:06 PM
Last Post: BashBedlam
  Running script on multiple files Afrodizzyjack 1 2,521 May-14-2021, 10:49 PM
Last Post: Yoriz
  How do use data from csv files as variables? JUSS1K 1 2,160 Oct-25-2020, 08:31 PM
Last Post: GOTO10

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020