May-12-2020, 04:51 AM
Your code skips first occurrence of division, email combination.
I assume your csv file has hader, e.g.
I assume your csv file has hader, e.g.
Output:Division,field2,field3,field4,Email
Electrical,field2,field3,field4,[email protected]
Automotive,field2,field3,field4,[email protected]
Fuses,field2,field3,field4,[email protected]
Electrical,field2,field3,field4,[email protected]
Automotive,field2,field3,field4,[email protected]
Fuses,field2,field3,field4,[email protected]
Electrical,field2,field3,field4,[email protected]
Automotive,field2,field3,field4,[email protected]
Fuses,field2,field3,field4,[email protected]
using csv moduleimport csv from collections import defaultdict dupes = defaultdict(list) with open('Tdf contacts.csv', 'r') as read_obj: csv_reader = csv.DictReader(read_obj) fieldnames = csv_reader.fieldnames for line in csv_reader: key = (line['Email'].lower(), line['Division']) dupes[key].append(line) with open('new_dupes_file.csv', 'w') as f: wrtr = csv.DictWriter(f, fieldnames=fieldnames) wrtr.writeheader() for key, values in dupes.items(): if len(values) > 1: wrtr.writerows(values)using pandas
import pandas as pd df = pd.read_csv('Tdf contacts.csv') df['Email'] = df['Email'].apply(lambda x: x.lower()) df['Duplicated'] = df.duplicated(subset=['Division', 'Email'], keep=False) df_dupes = df[df['Duplicated']].sort_values(by=['Division', 'Email']) df_dupes.drop(['Duplicated'], axis=1).to_csv('dupes.csv', index=False)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs