Python Forum

Full Version: Inner Join merging bug?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi!

I have the following very simple code:

dfTycho = read_excel('TychoList.xlsx')
dfCodes = read_excel('CompaniesCodes.xlsx')
dfcomphousecodes = pd.merge(dfTycho, dfCodes, on='CompanyName', how='inner')
TychoList.xlsx has 172 lines in excel, and around 30 columns, most of which we will not need. Column 3 is 'CompanyName' which contains all the names we need at the moment.
CompaniesCodes.xlsx has about 1400 lines with 2 columns. Column 1 is the 'CompanyName' which contains every company that could appear in the other one with a code in column two.

I would like to add the codes from CompaniesCodes.xlsx to the companies which are in "TychoList", so in the end it is the same 172 lines but there is a new column with the appropriate codes (like a VLOOKUP in excel). The thing is, every time I run this or try a variarion the merging returns a database which is 1400 lines long and I have no idea why. It does an outer join instead of an inner join. I have used pandas before and this is very strange to me, i am probably not seeing some tiny detail. Do you people have any ideas?