Python Forum

Full Version: best option for comparing two csv files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

I have the code below which compares 2 csv files and appends the differences found in the new_data file with file_with_all_data.
The aim of the code is to append new data to a historic file with all data.

I want to change the code to only compare data in the first column in both files and if there are differences, write the entire row to the difference file. Is this the best way to do this? Or would you recommend use the diff function?

import csv


with open('file_with_all_data.csv', 'r') as t1, open('new_data.csv', 'r') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

matches = []

with open('additions.csv', 'w') as outFile:
    for line in filetwo:
        if line not in fileone:
            matches.append(line)
            outFile.write(line)    

with open('file_with_all_data.csv', 'w' ) as outFile:
    outFile.write(''.join(fileone).strip() + '\n' + ''.join(matches))
on Linux, use diff from command line.
If you need to do it programmatically, look through: https://pypi.org/search/?q=%27file+diff%27
This lists all file diff packages in 'last updated' order.
You will have to examine to see which packages may be applicable.