Match CSV files for difference
 Match CSV files for difference Cuz Programmer named Tim Posts: 9 Threads: 3 Joined: Dec 2018 Reputation: Dec-17-2018, 06:19 PM (This post was last modified: Dec-17-2018, 06:19 PM by Cuz.) Hi guys! I have a real-life problem and wanted to know if there is a way to do it in more efficient way. I have two CSV files I need to compare to see if there are any differences. Let's say we have a table: ...file1.csv.....file2.csv .....A..............A .....B..............C .....C..............E .....D..............F .....E..............G .....R..............Z .....Z..............H My outcome will be: [B,D,F,G,R,H] because those values are either in file1 or in file2 - but not in both of them. The way I tackled this is I iterated through each row in file1 and file2 creating lists from them and got differences using: `diff = set(list1) - set(list2)` The problem is, both files are containing almost 100k records each and it takes an awful lot of time to iterate through them. Is there a better way to work on big sets of data like this? I'm using csv library and Python 3.5. Reply jeanMichelBain Programmer named Tim Posts: 18 Threads: 1 Joined: Dec 2018 Reputation: Dec-17-2018, 10:31 PM Hello, I tried quickly with 100k records, and I got a very quick result, < one second. Probably my test is wrong, so can you show your code and a sample of data ? Reply Cuz Programmer named Tim Posts: 9 Threads: 3 Joined: Dec 2018 Reputation: Dec-18-2018, 01:12 PM Ok, so maybe the reason is elsewhere. I'm using a code looking like this: ```import csv, os os.chdir(r"C:\Users\me\Desktop\compare files") file1_list = [] file1 = open(r"file 1.csv") file1_reader_obj = csv.reader(file1) file1_data = list(file1_reader_obj) for row in file1_data: x = file1_data.index(row) file1_list.append(file1_data[x][1])```And I just figured out I'm a moron since I already passed the file into list. So, I can use something like this to compare both files: ```for row in file1_list: x = file1_list.index(row) if file1_list[x][1] in file2_list: continue else: print (RPT0706_list[x][1])```Now the issue is that both files are structured like list of lists: ``````Output:[['1', 'a', 'a', 'a'], ['2', 'b', 'b', 'b'], ['3', 'c', 'c', 'c'], ['4', 'd', 'd', 'd'], ['5', 'e', 'e', 'e'], ['6', 'f', 'f', 'f']]``````And I just have to check if the first item in the inner list of file1 (e.g 1, 2, 3, etc.) is listed as first item somewhere within inner lists in file2. Let me know if I'm not making any sense. I'm still learning an art of expressing your thoughts when it comes to programming issues :) Reply ichabod801 Bunny Rabbit Posts: 4,231 Threads: 97 Joined: Sep 2016 Reputation: Dec-18-2018, 01:25 PM This is a problem: ```for row in file1_data: x = file1_data.index(row) file1_list.append(file1_data[x][1])```The second line is constantly searching through the list. My first thought was that it's much better to use enumerate: ```for x, row in enumerate(file1_data): file1_list.append(file1_data[x][1])```But then I read the third line. file1_data[x] is row. They're the same thing. Why go to all that trouble? ```for row in file1_data: file1_list.append(row[1])```Which is so simple it might as well be a list comprehension: `file1_list = [row[1] for row in file1_data]` Craig "Ichabod" O'Brien - xenomind.com I wish you happiness. Recommended Tutorials: BBCode, functions, classes, text adventures Reply Cuz Programmer named Tim Posts: 9 Threads: 3 Joined: Dec 2018 Reputation: Dec-18-2018, 02:16 PM Thanks! This really helps and I have fully working script now :) I think that I completely misunderstood index() method. Reply

 Possibly Related Threads… Thread Author Replies Views Last Post Open and read multiple text files and match words kozaizsvemira 3 4,557 Jul-07-2021, 11:27 AM Last Post: Larz60+ python 3 find difference between 2 files pd007 2 955 May-22-2020, 01:16 AM Last Post: Larz60+ Look for match in two files and print out in the first file Batistuta 0 813 Mar-03-2020, 02:27 PM Last Post: Batistuta Difference Between 2 files enigma619 3 1,278 Dec-21-2019, 01:39 PM Last Post: Gribouillis How to match two CSV files timlamont 9 2,783 Oct-01-2019, 05:54 PM Last Post: timlamont Python Script to Produce Difference Between Files and Resolve DNS Query for the Outpu sultan 2 1,203 May-22-2019, 07:20 AM Last Post: buran Compare two large CSV files for a match Python_Newbie9 3 4,042 Apr-22-2019, 08:49 PM Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020