Two dataframes merged

Ecniv · Jun-04-2019, 06:46 PM

Situation:
Learning Pandas, using Nederlands OV CHipkaart data (mine) for a month to test pandas and practice
Currently can read the file into a dataframe.

Quote:Date, Check-in, Departure, Check-out, Destination, Amount, Transaction, Class, Product, Comments, Name, Card number

I am splitting the data into two dataframes, one for checking in, one for checking out.

Quote:Date, Time, Departure, Type
Date, Time, Destination, Cost, Type

My code...

import os
import re
import datetime as dt
import numpy
import pandas as pd

data = None
try:
    df = pd.read_csv('privatefiles/t.csv',';')
    #df = pd.read_csv('...','\t')
    #xlsx = Excel_File('t.xlsx')
    #df = pd.read_excel(xlsx,0)
    #print(df.head(3))
    #print(len(df))
except Exception as err:
    print('Problem with reading the file.<br/> '+err)

checkin = df[df['Transaction']=='Check-in'].rename(columns={'Check-in':'Time'})
checkin = checkin[['Date','Time','Departure']].reset_index()

checkout = df[df['Transaction']=='Check-out']
checkout = checkout[['Date','Check-out','Destination','Amount']].rename(columns={'Check-out':'Time','Amount':'Cost'}).reset_index()
#checkout['Type','Departure','Duration']=['Check-out','',0.]

cleaned = pd.DataFrame(columns=['Type','Date','Time','Departure','Destination','Cost','Duration','IsWork'])

cleaned += checkin

Complexity:
I'm still learning so working through this step by step.
Just as I get to understand I lose or forget how.

Problem:
I cannot seem to get the two dataframes to merge.
Can you suggest a way, or several ways to do this better?
or point to a reference to do this better?

chisox721 · Jun-05-2019, 06:51 PM

At first glance it seems as though you can't get the dataframes to merge because you don't have a line in your script that calls for them to be merged. Read this article and see if it helps you out. https://chrisalbon.com/python/data_wrang...dataframe/

Ecniv · Jun-06-2019, 06:41 PM

updated python coding... still doesnt work.... :)

import os
import re
import datetime as dt
import numpy
import pandas as pd

data = None
try:
    df = pd.read_csv('privatefiles/t.csv',';')
    #df = pd.read_csv('...','\t')
    #xlsx = Excel_File('t.xlsx')
    #df = pd.read_excel(xlsx,0)
    print(df.head(3))
    print(len(df))
except Exception as err:
    print('Problem with reading the file.<br/> '+err)


checkin = df[df['Transaction']=='Check-in']
checkin = checkin.rename(columns={'Check-in':'time','Date':'date','Departure':'departure'}).reset_index()
checkin = checkin[['date','time','departure']]
checkin['type']='checkin'
#checkin.head(5)

checkout = df[df['Transaction']=='Check-out']
checkout = checkout.rename(columns={'Check-out':'time','Amount':'cost','Destination':'arrival','Date':'date'}).reset_index()
checkout = checkout[['date','time','arrival','cost']]
checkout['type']='checkout'
#checkout.head(5)

cleaned = pd.DataFrame(columns=['type','date','time','departure','arrival','cost','duration','iswork'])
#cleaned

cleaned = pd.concat( [cleaned,checkin] , ignore_index=False, sort=False)
#cleaned.head(5)

cleaned.sort_values(by=['date','time'],ascending=[True,False],inplace=True)
checkout.sort_values(by=['date','time'],ascending=[True,True],inplace=True)
for i,r in checkout.iterrows():
    print(i)
    print(r)
    print('r - date - '+r['date'])
    print(cleaned.loc(['date']==r['date'] and ['time']<r['time']))
    break
#cleaned.head(4)
#cleaned.loc(['date']==r['date'])

chisox721 · Jun-06-2019, 07:02 PM

Is your goal to combine the data in the "checkin" and "checkout" frames? If that is the case, first off use the "merge" function - not "concat". Also, when using "merge" you need to specify which column you want to merge on (identify the column with identical values in the "checkin" and "checkout" frame).

If you attach the file you're working with I'll write out the script for you but it would be more helpful long term if you understand the concepts yourself.

Ecniv · (This post was last modified: Jun-06-2019, 08:57 PM by Ecniv.)

Ahh almost got it... but not quite...

import os
import re
import datetime as dt
import numpy
import pandas as pd

data = None
try:
    df = pd.read_csv('privatefiles/t.csv',';')
    #df = pd.read_csv('...','\t')
    #xlsx = Excel_File('t.xlsx')
    #df = pd.read_excel(xlsx,0)
    print(df.head(3))
    print(len(df))
except Exception as err:
    print('Problem with reading the file.<br/> '+err)


checkin = df[df['Transaction']=='Check-in']
checkin = checkin.rename(columns={'Check-in':'dtime','Date':'date','Departure':'departure'}).reset_index()
checkin = checkin[['date','dtime','departure']]
#checkin.head(5)

checkout = df[df['Transaction']=='Check-out']
checkout = checkout.rename(columns={'Check-out':'atime','Amount':'cost','Destination':'arrival','Date':'date'}).reset_index()
checkout = checkout[['date','atime','arrival','cost']]
#checkout.head(5)

cleaned = pd.DataFrame(columns=['date','dtime','departure','atime','arrival','cost','duration','iswork'])
#cleaned

cleaned = pd.concat( [cleaned,checkin] , ignore_index=False, sort=False)
#cleaned.head(5)

cleaned.sort_values(by=['date','dtime'],ascending=[True,True],inplace=True)
checkout.sort_values(by=['date','atime'],ascending=[True,True],inplace=True)
#print(cleaned.head(5))
#print(checkout.head(3))

for i,r in checkout.iterrows():
    print(cleaned['date'][i])
    print(r['date'])
    print(cleaned['dtime'][i])
    print(r['atime'])
    f = cleaned[(cleaned['date']==r['date']) & (cleaned['dtime']<cleaned['atime'])]
    print(f)
    break
#cleaned.head(4)

Quote:04-05-2019
04-05-2019
09:53
10:02
Empty DataFrame
Columns: [date, dtime, departure, atime, arrival, cost, duration, iswork]
Index: []

I am obviously doing something wrong in the filter on the pandas. but the result is an empty dataframe... :/
Oh, and if it does find one (with no data already in) would it be updatable ?
Or do I need to update and send to another output dataframe ?

(Jun-06-2019, 07:02 PM)chisox721 Wrote: Is your goal to combine the data in the "checkin" and "checkout" frames? If that is the case, first off use the "merge" function - not "concat". Also, when using "merge" you need to specify which column you want to merge on (identify the column with identical values in the "checkin" and "checkout" frame).

If you attach the file you're working with I'll write out the script for you but it would be more helpful long term if you understand the concepts yourself.

Hi

I cannot use merged as you require a unique field. I do not have one.
I am trying to get the check out for each check in to produce a trip/journey, duration, cost etc...
But I need the check out that is immediately after the checkin (assuming I have checked in and out properly). There is a small chance I checked in and forgot to check out... Unlikely but possible.

Any ideas on why its not bringing back at least one record when filtered on date and time ?

Ecniv · Jun-11-2019, 07:14 PM

Weirdly if I use one or the other to filter the dataframe, it runs. But both together and it errors.

Ecniv · Jun-11-2019, 08:25 PM

for i,r in checkout.iterrows():
    ind = -1
    f = cleaned[cleaned['atime'].isnull()]
    print('1',len(f))
    #print(f.head(4))
    if f is not None:
        f = f[f['date']==r['date']]
    print('2',len(f))
    #print(f.head(4))
    if f is not None:
        f = f[f['dtime']<r['atime']]
    print('3',len(f))
    #print(f.head(4))
    if f is not None:
        ind = iloc(f)
    print('4',f.head(4))
    if ind>=0:
        print(ind)
        cleaned[ind]['atime','arrival','cost']=r['atime','arrival','cost']
    break

Still testing - this almost worked but errors as it cannot find the index of the cleaned to update the values...

Ecniv · Jun-12-2019, 08:12 PM

Ok. Managed to get the index back and find the original row via the index. However it is still not doing it (still erroring) so I must still be doing it wrong.... any ideas (I have a headache now)
Final loop is now :

for i,r in checkout.iterrows():
    ind = -1
    f = cleaned[cleaned['atime'].isnull()]
    if f is not None:
        f = f[f['date']==r['date']]
    if f is not None:
        f = f[f['dtime']<r['atime']]
    if f is not None:
        cleaned.iloc[f.index[0]]['atime','arrival','cost']=r['atime','arrival','cost']

Ecniv · Jun-13-2019, 06:25 PM

for i,r in checkout.iterrows():
    ind = -1
    f = cleaned[cleaned['atime'].isnull()]
    if f is not None:
        f = f[f['date']==r['date']]
    if f is not None:
        f = f[f['dtime']<r['atime']]
    if (f is not None) & (len(f)>0):
        ind = f.index[0]
    if ind is not None:
        cleaned.iloc[ind][['atime','arrival','cost']] = r[['atime','arrival','cost']]
        #lower three lines combined into one above
        #cleaned.iloc[ind]['atime']   = r['atime']
        #cleaned.iloc[ind]['arrival'] = r['arrival']
        #cleaned.iloc[ind]['cost']    = r['cost']

Updated the dataframe - although with iterations...

chisox721 · Jun-15-2019, 08:58 PM

(Jun-06-2019, 08:52 PM)Ecniv Wrote: Hi

I cannot use merged as you require a unique field. I do not have one.
I am trying to get the check out for each check in to produce a trip/journey, duration, cost etc...
But I need the check out that is immediately after the checkin (assuming I have checked in and out properly). There is a small chance I checked in and forgot to check out... Unlikely but possible.

Any ideas on why its not bringing back at least one record when filtered on date and time ?

I'm still not really understanding what you're looking to accomplish and your codes are very confusing. One question I have is how do you not have a unique column to merge on? I would think that if you are checking in/checking out of a hotel that would mean you have a corresponding transaction number or something similar that could be used to properly identify the trip. Even a separate column that includes a reference to the specific trip like "SpainVacation" would work. That is unless I'm completely wrong in my understanding of what trying to accomplish.

Like I said before it would be much easier for me to help you if I could see the contents of the file you're working with - or at least a file with a similar structure.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Merging two DataFrames based on indexes from two other DataFrames	lucinda_rigeitti	0	1,764	Jan-16-2020, 08:36 PM Last Post: lucinda_rigeitti
	Read exel with merged cells and write to another excel	SriMekala	0	2,999	Aug-10-2019, 07:14 AM Last Post: SriMekala
	how read and write merged cells in excel	SriMekala	1	15,169	Aug-07-2019, 11:27 PM Last Post: scidam

Two dataframes merged

User Panel Messages

Announcements