Python Forum

Full Version: I'm trying to merge 2 .csv files with no joy!
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have two Excel .csv workbooks;
"C:\Users\Sick\Activities.csv" which holds years of data
"C:\Users\Sick\report.csv" which is a daily download of the last 6 months data
Is it possible to compare the two and update "C:\Users\Sick\Activities.csv" with any changes and/or additions?
The relevant data in "C:\Users\Sick\Activities.csv" is located in range A1:AC40000
Activities also has additional columns AD:AO which need to be left alone.
The relevant data in "C:\Users\Sick\report.csv" is located in range A1:AC300

Is this even possible with code? Any help is greatly appreciated,
Sick
You need to define how to compare rows in the two spreadsheets. If you were doing this manually, what columns do you use to compare rows in the spreadsheets. If you find a match in those columns, what information would you copy from report to Activities?

How do you determine if there is a row in report that doesn't exist in Activities? If you find such a row, how would you copy the report data into the activity spreadsheet? How do you decide where to insert the row?

If you can define those two processes, you can probably automate the process using a tool like Pandas.
Take a look at this post:
Seems like this is the brunt of what you wish to do.
import pandas as pd

# Load the CSV files
activities = pd.read_csv("C:\\Users\\Sick\\Activities.csv")
report = pd.read_csv("C:\\Users\\Sick\\report.csv")

# Relevant data from both CSV files
activities_relevant = activities.iloc[:, :29]
report_relevant = report.iloc[:, :29]

# Merge the dataframes based on all columns to find new or updated rows
merged = report_relevant.merge(activities_relevant, how='left', indicator=True)

# New or updated rows are those that do not match the existing 'Activities.csv' entries
new_or_updated = merged[merged['_merge'] == 'left_only'].drop(columns=['_merge'])

# Append new or updated rows to the original 'Activities.csv'
updated_activities = pd.concat([activities, new_or_updated], ignore_index=Tr
I hope this will help also in case of any help for a better computer and link removed