May-24-2020, 08:10 PM
I'm still at it folks! So I completed the second step by comparing an exceptions file to the dupes file and deleting those exceptions from the main dupes list to create dupes_cleaned.
It's taken two weeks of intense study, and multiple, multiple iterations to finally figure out it took only two lines of code!!! BUT, that intense study means that 1) these lines are MINE and MINE alone! and 2) I understand the theory behind it much better than if I'd just copied someone else's code.
Please let me know if I missed anything or if you would have taken a different approach.
It's taken two weeks of intense study, and multiple, multiple iterations to finally figure out it took only two lines of code!!! BUT, that intense study means that 1) these lines are MINE and MINE alone! and 2) I understand the theory behind it much better than if I'd just copied someone else's code.
Please let me know if I missed anything or if you would have taken a different approach.
exceptions=pd.read_csv('exceptions.csv') exceptions['EmailAddress'] = exceptions['EmailAddress'].str.lower() except_emails=exceptions['EmailAddress'] for e in except_emails: df_dupes=df_dupes[(df_dupes['EmailAddress'] != e)] df_dupes.to_excel('dupes cleaned.xlsx', index = False)