Jun-08-2017, 02:58 PM
Dear all,
First of all thanks for your help, I am really very new at this so I don't even know if I'll be able to do it.
I have two .csv files containing a LOT of data, which are built like:
row1, row2, row3, row4
my problem is that one of those files contains extra lines from the other, and I wish to remove them
for instance the first looks like
1, 1, 1, xx,
2, 2, 2, yy,
3, 3, 3, ab,
4, 4, 4, cd,
and the second looks like,
1, 1, 1, xx,
2, 2, 2, yy,
3, 3, 3, ab,
3.5, 3.5 ,3.5, fg
4, 4, 4, cd,
And I want to find an easy way to remove that line by comparison between the two files. I tried with excel but it takes a seriously long time, and eats up all the memory from my computer. After all we're talking about roughly 700 thousand lines.
This is what I tried:
I merged the two files, and wrote the script below, by simply going through the forums.
but it's giving lots of errors, for starters it does not recognise the " signs"
First of all thanks for your help, I am really very new at this so I don't even know if I'll be able to do it.
I have two .csv files containing a LOT of data, which are built like:
row1, row2, row3, row4
my problem is that one of those files contains extra lines from the other, and I wish to remove them
for instance the first looks like
1, 1, 1, xx,
2, 2, 2, yy,
3, 3, 3, ab,
4, 4, 4, cd,
and the second looks like,
1, 1, 1, xx,
2, 2, 2, yy,
3, 3, 3, ab,
3.5, 3.5 ,3.5, fg
4, 4, 4, cd,
And I want to find an easy way to remove that line by comparison between the two files. I tried with excel but it takes a seriously long time, and eats up all the memory from my computer. After all we're talking about roughly 700 thousand lines.
This is what I tried:
I merged the two files, and wrote the script below, by simply going through the forums.
but it's giving lots of errors, for starters it does not recognise the " signs"
import sys, io import gzip, zipfile import csv, sqlite3 from sys import argv _, input, output = argv inFile = csv.reader(open(input, “r”)) outFile = csv.writer(open(output, “w”)) listLines = set() for row in inFile: key = (row[0]) if key in listLines: continue else: outFile.writerow(row) listLines.append(row) outFile.close() inFile.close()can anyone please help?