Python Forum
Need help getting unique values across two columns of a dataframe - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Need help getting unique values across two columns of a dataframe (/thread-19448.html)



Need help getting unique values across two columns of a dataframe - a_real_phoenix - Jun-29-2019

Hi there, my question is pretty much what the title says. I have a pandas dataframe with four columns, and I want to be able to get the unique values across pairs of columns. Here's my code:

df_concat = pd.concat([df1, df2, df3, df4], axis=1)

len(df_concat['K-mers A'].unique().tolist())
This code works well for getting the number of unique values in one column, but I need to get the values across two columns. Since columns will have some of the same values, I can't just find them separately and add them together. I'd really appreciate any help, as I'm struggling to figure this out :)

Bonus question: How would I go about finding not the number of unique values, but the number of values that reoccur across both columns? :) Once I get this my work will be done :D


RE: Need help getting unique values across two columns of a dataframe - scidam - Jun-30-2019

(Jun-29-2019, 03:06 PM)a_real_phoenix Wrote: Since columns will have some of the same values, I can't just find them separately and add them together. I'd really appreciate any help, as I'm struggling to figure this out :)
You can convert each column to list, concatenate lists and apply numpy.unique, e.g.
import pandas as pd
df = pd.DataFrame({'col1': ['a', 'b', 'c'], 'col2': ['c', 'b', 'd']})
pd.np.unique(df['col1'].tolist() + df['col2'].tolist())
Hope, you can solve the bonus question by yourself.