Python Forum
Remove extra lines from .csv file in comparison with another
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Remove extra lines from .csv file in comparison with another
#2
The easy way to do it, with smallish files, is to do what you're doing.  Load one of the files completely in memory so you know what to ignore, and then process the other file line by line.  As long as the larger file is less than however much ram you have, that should be fine (though it may take a while to run).

Another way to do it, which would take an exceptionally long time but would work if you don't have enough ram to load the entire file, would be to go line-by-line through the file containing things you want to ignore, for EACH line in the other file.  You'd only ever have two lines in memory at a time, but also it'd take hours to finish. You almost definitely don't want to do this.

ANOTHER way to do it, would be to create a small sqlite database, and insert all the rows of each csv into different tables, and then run a single query to let the database engine handle pruning duplicates.  Something like:
insert into output_table
select * from infile2 as in2
where not exists (
    select 1
    from infile1 as in1
    where in1.col1 = in2.col1
        and in1.col2 = in2.col2
        and in1.col3 = in2.col3
        and in1.col4 = in2.col4
)
Reply


Messages In This Thread
RE: Remove extra lines from .csv file in comparison with another - by nilamo - Jun-08-2017, 04:09 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Photo String comparison in a csv file in Python Pandas fleafy 2 1,201 Nov-18-2022, 09:38 PM
Last Post: fleafy
  Delete multiple lines from txt file Lky 6 2,377 Jul-10-2022, 12:09 PM
Last Post: jefsummers
  failing to print not matched lines from second file tester_V 14 6,243 Apr-05-2022, 11:56 AM
Last Post: codinglearner
  Extracting Specific Lines from text file based on content. jokerfmj 8 3,129 Mar-28-2022, 03:38 PM
Last Post: snippsat
  Importing a function from another file runs the old lines also dedesssse 6 2,623 Jul-06-2021, 07:04 PM
Last Post: deanhystad
  [Solved] Trying to read specific lines from a file Laplace12 7 3,624 Jun-21-2021, 11:15 AM
Last Post: Laplace12
  all i want to do is count the lines in each file Skaperen 13 4,987 May-23-2021, 11:24 PM
Last Post: Skaperen
  More elegant way to remove time from text lines. Pedroski55 6 4,013 Apr-25-2021, 03:18 PM
Last Post: perfringo
  Remove single and double quotes from a csv file in 3 to 4 column shantanu97 0 7,047 Mar-31-2021, 10:52 AM
Last Post: shantanu97
  Remove Blank Lines from docx table and paragraphs bsudhirk001 1 3,749 Feb-14-2021, 12:38 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020