Python Forum
finding dupes (with a twist)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
finding dupes (with a twist)
#10
Your code skips first occurrence of division, email combination.
I assume your csv file has hader, e.g.
Output:
Division,field2,field3,field4,Email Electrical,field2,field3,field4,[email protected] Automotive,field2,field3,field4,[email protected] Fuses,field2,field3,field4,[email protected] Electrical,field2,field3,field4,[email protected] Automotive,field2,field3,field4,[email protected] Fuses,field2,field3,field4,[email protected] Electrical,field2,field3,field4,[email protected] Automotive,field2,field3,field4,[email protected] Fuses,field2,field3,field4,[email protected]
using csv module
import csv
from collections import defaultdict

dupes = defaultdict(list)
with open('Tdf contacts.csv', 'r') as read_obj:
    csv_reader = csv.DictReader(read_obj)
    fieldnames = csv_reader.fieldnames
    for line in csv_reader:
        key = (line['Email'].lower(), line['Division'])
        dupes[key].append(line)

with open('new_dupes_file.csv', 'w') as f:
    wrtr = csv.DictWriter(f, fieldnames=fieldnames)
    wrtr.writeheader()
    for key, values in dupes.items():
        if len(values) > 1:
            wrtr.writerows(values)
using pandas
import pandas as pd 

df = pd.read_csv('Tdf contacts.csv')
df['Email'] = df['Email'].apply(lambda x: x.lower())
df['Duplicated'] = df.duplicated(subset=['Division', 'Email'], keep=False)
df_dupes = df[df['Duplicated']].sort_values(by=['Division', 'Email'])
df_dupes.drop(['Duplicated'], axis=1).to_csv('dupes.csv', index=False)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Messages In This Thread
finding dupes (with a twist) - by Lyle - Apr-28-2020, 09:51 PM
RE: finding dupes (with a twist) - by stullis - Apr-28-2020, 10:02 PM
RE: finding dupes (with a twist) - by Lyle - Apr-28-2020, 10:11 PM
RE: finding dupes (with a twist) - by buran - Apr-29-2020, 04:16 AM
RE: finding dupes (with a twist) - by Lyle - Apr-29-2020, 11:28 PM
RE: finding dupes (with a twist) - by buran - Apr-30-2020, 03:25 AM
RE: finding dupes (with a twist) - by Lyle - Apr-30-2020, 07:36 PM
RE: finding dupes (with a twist) - by Lyle - May-09-2020, 12:29 AM
RE: finding dupes (with a twist) - by Lyle - May-11-2020, 10:48 PM
RE: finding dupes (with a twist) - by buran - May-12-2020, 04:51 AM
RE: finding dupes (with a twist) - by Lyle - May-13-2020, 12:07 AM
RE: finding dupes (with a twist) - by Lyle - May-24-2020, 08:10 PM
RE: finding dupes (with a twist) - by buran - May-24-2020, 08:33 PM
RE: finding dupes (with a twist) - by Lyle - May-25-2020, 12:18 AM
RE: finding dupes (with a twist) - by buran - May-25-2020, 05:44 AM
RE: finding dupes (with a twist) - by Lyle - May-25-2020, 03:42 PM
RE: finding dupes (with a twist) - by buran - May-25-2020, 04:29 PM
RE: finding dupes (with a twist) - by Lyle - May-25-2020, 05:01 PM
RE: finding dupes (with a twist) - by buran - May-25-2020, 05:09 PM
RE: finding dupes (with a twist) - by Lyle - May-28-2020, 06:00 PM
RE: finding dupes (with a twist) - by buran - May-28-2020, 06:19 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020