Remove extra lines from .csv file in comparison with another

**nilamo** · Jun-08-2017, 04:09 PM

The easy way to do it, with smallish files, is to do what you're doing. Load one of the files completely in memory so you know what to ignore, and then process the other file line by line. As long as the larger file is less than however much ram you have, that should be fine (though it may take a while to run).

Another way to do it, which would take an exceptionally long time but would work if you don't have enough ram to load the entire file, would be to go line-by-line through the file containing things you want to ignore, for EACH line in the other file. You'd only ever have two lines in memory at a time, but also it'd take hours to finish. You almost definitely don't want to do this.

ANOTHER way to do it, would be to create a small sqlite database, and insert all the rows of each csv into different tables, and then run a single query to let the database engine handle pruning duplicates. Something like:

insert into output_table
select * from infile2 as in2
where not exists (
    select 1
    from infile1 as in1
    where in1.col1 = in2.col1
        and in1.col2 = in2.col2
        and in1.col3 = in2.col3
        and in1.col4 = in2.col4
)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	String comparison in a csv file in Python Pandas	fleafy	2	1,201	Nov-18-2022, 09:38 PM Last Post: fleafy
	Delete multiple lines from txt file	Lky	6	2,377	Jul-10-2022, 12:09 PM Last Post: jefsummers
	failing to print not matched lines from second file	tester_V	14	6,243	Apr-05-2022, 11:56 AM Last Post: codinglearner
	Extracting Specific Lines from text file based on content.	jokerfmj	8	3,129	Mar-28-2022, 03:38 PM Last Post: snippsat
	Importing a function from another file runs the old lines also	dedesssse	6	2,623	Jul-06-2021, 07:04 PM Last Post: deanhystad
	[Solved] Trying to read specific lines from a file	Laplace12	7	3,624	Jun-21-2021, 11:15 AM Last Post: Laplace12
	all i want to do is count the lines in each file	Skaperen	13	4,987	May-23-2021, 11:24 PM Last Post: Skaperen
	More elegant way to remove time from text lines.	Pedroski55	6	4,013	Apr-25-2021, 03:18 PM Last Post: perfringo
	Remove single and double quotes from a csv file in 3 to 4 column	shantanu97	0	7,047	Mar-31-2021, 10:52 AM Last Post: shantanu97
	Remove Blank Lines from docx table and paragraphs	bsudhirk001	1	3,749	Feb-14-2021, 12:38 AM Last Post: Larz60+

Remove extra lines from .csv file in comparison with another

User Panel Messages

Announcements