Python Forum
duplicates - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: duplicates (/thread-22403.html)



duplicates - nsx200 - Nov-11-2019

Hello,

I have an assingment, and what seems a simple question, that is driving me nuts and would like a point in the right direction (not an answer) if possible.

I have been given two seperate dataframes that contain sales data. Both have a 'country' column that gives the names of various contries where each sale originated from.

My task is simply to return a list (or set, dataframe, series) of values that are found in the 'country' column of the two dataframes.

So if I had for example the following two sets of values in these columns....
df1['column1'] = ['England','Germany','Netherlands','Scotland','Spain']
df2['column2'] = ['England','Scotland','Wales,'Spain','Italy','France']
I would want to return a list that contains just England, Scotland and Spain as they are the only values that appear in both dataframes.

So far I have extracted the unique values of both columns and placed them in their own dataframe - so a new dataframe with just the two columns (the unique values from the first df and then the unique values from the 2nd df).

My problem is that trying to use either the duplicate(), Merge() or intersection methods are not working for me. The duplicate() method does not work at all and the merge and intersection methods do not return enough matches.

And i think it is because the two columns of my new dataframe have different length in terms of the rows. The frist column has 300 rows whilst the 2nd column only has 210. So the merge and interesection methods I beleive are trying to match on the values, but also on the indexes too.

I need to return a list irrelevant of indexes and just find values that are the same in both columns.

Any pointers please without giving away answers?

Thanks


RE: duplicates - jefsummers - Nov-11-2019

You can do with your dataframes something like this, in lists
lst1 = ['England','Germany','Netherlands','Scotland','Spain']
lst2 = ['England','Scotland','Wales','Spain','Italy','France']
lst3 = []
for country in lst1 :
    if country in lst2 :
        lst3.append(country)
print(lst3)



RE: duplicates - ThomasL - Nov-11-2019

Convert both lists of country names into sets and get the intersection.


RE: duplicates - nsx200 - Nov-12-2019

Thanks both.

I have the intersection working once cast to sets. All is now fine and returning a list of the correct matches.

Thanks