Hi guys!
I have a real-life problem and wanted to know if there is a way to do it in more efficient way. I have two CSV files I need to compare to see if there are any differences. Let's say we have a table:
...file1.csv.....file2.csv
.....A..............A
.....B..............C
.....C..............E
.....D..............F
.....E..............G
.....R..............Z
.....Z..............H
My outcome will be: [B,D,F,G,R,H] because those values are either in file1 or in file2 - but not in both of them. The way I tackled this is I iterated through each row in file1 and file2 creating lists from them and got differences using:
The problem is, both files are containing almost 100k records each and it takes an awful lot of time to iterate through them. Is there a better way to work on big sets of data like this? I'm using csv library and Python 3.5.
I have a real-life problem and wanted to know if there is a way to do it in more efficient way. I have two CSV files I need to compare to see if there are any differences. Let's say we have a table:
...file1.csv.....file2.csv
.....A..............A
.....B..............C
.....C..............E
.....D..............F
.....E..............G
.....R..............Z
.....Z..............H
My outcome will be: [B,D,F,G,R,H] because those values are either in file1 or in file2 - but not in both of them. The way I tackled this is I iterated through each row in file1 and file2 creating lists from them and got differences using:
diff = set(list1) - set(list2)
The problem is, both files are containing almost 100k records each and it takes an awful lot of time to iterate through them. Is there a better way to work on big sets of data like this? I'm using csv library and Python 3.5.