Sep-10-2019, 08:22 AM
Good morning, this is my first message in the forum so Hi
.
I have a doubt and I don't know how to proceed, I'm trying to compare two columns in two different data frames.
The problem is that now what I'm returning in "is_in_column" is True and what I want to return is the index in which it founds the coincidence.
Maybe it's very easy but I don't know how to do it.
Thanks,
Iván

I have a doubt and I don't know how to proceed, I'm trying to compare two columns in two different data frames.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import pandas as pd import numpy as np from sklearn import linear_model from sklearn import model_selection from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt import seaborn as sb df = pd.read_csv(r "Prueba1CSV.csv" , sep = ";" ) df1 = pd.read_csv(r "Prueba2CSV.csv" , sep = ";" ) def is_in_column(data, values): if data in values: return True else : return False df[ 'Present_in_other' ] = df[Cabeza]. apply (is_in_column, values = df1[Cabeza2].tolist()) df3 = df[df[ 'Present_in_other' ] = = False ] |
Maybe it's very easy but I don't know how to do it.
Thanks,
Iván