Posts: 44
Threads: 11
Joined: Mar 2019
Situation:
Learning Pandas, using Nederlands OV CHipkaart data (mine) for a month to test pandas and practice
Currently can read the file into a dataframe.
Quote:Date, Check-in, Departure, Check-out, Destination, Amount, Transaction, Class, Product, Comments, Name, Card number
I am splitting the data into two dataframes, one for checking in, one for checking out.
Quote:Date, Time, Departure, Type
Date, Time, Destination, Cost, Type
My code...
import os
import re
import datetime as dt
import numpy
import pandas as pd
data = None
try:
df = pd.read_csv('privatefiles/t.csv',';')
#df = pd.read_csv('...','\t')
#xlsx = Excel_File('t.xlsx')
#df = pd.read_excel(xlsx,0)
#print(df.head(3))
#print(len(df))
except Exception as err:
print('Problem with reading the file.<br/> '+err)
checkin = df[df['Transaction']=='Check-in'].rename(columns={'Check-in':'Time'})
checkin = checkin[['Date','Time','Departure']].reset_index()
checkout = df[df['Transaction']=='Check-out']
checkout = checkout[['Date','Check-out','Destination','Amount']].rename(columns={'Check-out':'Time','Amount':'Cost'}).reset_index()
#checkout['Type','Departure','Duration']=['Check-out','',0.]
cleaned = pd.DataFrame(columns=['Type','Date','Time','Departure','Destination','Cost','Duration','IsWork'])
cleaned += checkin Complexity:
I'm still learning so working through this step by step.
Just as I get to understand I lose or forget how.
Problem:
I cannot seem to get the two dataframes to merge.
Can you suggest a way, or several ways to do this better?
or point to a reference to do this better?
Posts: 28
Threads: 8
Joined: Oct 2017
At first glance it seems as though you can't get the dataframes to merge because you don't have a line in your script that calls for them to be merged. Read this article and see if it helps you out. https://chrisalbon.com/python/data_wrang...dataframe/
Posts: 44
Threads: 11
Joined: Mar 2019
updated python coding... still doesnt work.... :)
import os
import re
import datetime as dt
import numpy
import pandas as pd
data = None
try:
df = pd.read_csv('privatefiles/t.csv',';')
#df = pd.read_csv('...','\t')
#xlsx = Excel_File('t.xlsx')
#df = pd.read_excel(xlsx,0)
print(df.head(3))
print(len(df))
except Exception as err:
print('Problem with reading the file.<br/> '+err)
checkin = df[df['Transaction']=='Check-in']
checkin = checkin.rename(columns={'Check-in':'time','Date':'date','Departure':'departure'}).reset_index()
checkin = checkin[['date','time','departure']]
checkin['type']='checkin'
#checkin.head(5)
checkout = df[df['Transaction']=='Check-out']
checkout = checkout.rename(columns={'Check-out':'time','Amount':'cost','Destination':'arrival','Date':'date'}).reset_index()
checkout = checkout[['date','time','arrival','cost']]
checkout['type']='checkout'
#checkout.head(5)
cleaned = pd.DataFrame(columns=['type','date','time','departure','arrival','cost','duration','iswork'])
#cleaned
cleaned = pd.concat( [cleaned,checkin] , ignore_index=False, sort=False)
#cleaned.head(5)
cleaned.sort_values(by=['date','time'],ascending=[True,False],inplace=True)
checkout.sort_values(by=['date','time'],ascending=[True,True],inplace=True)
for i,r in checkout.iterrows():
print(i)
print(r)
print('r - date - '+r['date'])
print(cleaned.loc(['date']==r['date'] and ['time']<r['time']))
break
#cleaned.head(4)
#cleaned.loc(['date']==r['date'])
Posts: 28
Threads: 8
Joined: Oct 2017
Is your goal to combine the data in the "checkin" and "checkout" frames? If that is the case, first off use the "merge" function - not "concat". Also, when using "merge" you need to specify which column you want to merge on (identify the column with identical values in the "checkin" and "checkout" frame).
If you attach the file you're working with I'll write out the script for you but it would be more helpful long term if you understand the concepts yourself.
Posts: 44
Threads: 11
Joined: Mar 2019
Jun-06-2019, 08:52 PM
(This post was last modified: Jun-06-2019, 08:57 PM by Ecniv.)
Ahh almost got it... but not quite...
import os
import re
import datetime as dt
import numpy
import pandas as pd
data = None
try:
df = pd.read_csv('privatefiles/t.csv',';')
#df = pd.read_csv('...','\t')
#xlsx = Excel_File('t.xlsx')
#df = pd.read_excel(xlsx,0)
print(df.head(3))
print(len(df))
except Exception as err:
print('Problem with reading the file.<br/> '+err)
checkin = df[df['Transaction']=='Check-in']
checkin = checkin.rename(columns={'Check-in':'dtime','Date':'date','Departure':'departure'}).reset_index()
checkin = checkin[['date','dtime','departure']]
#checkin.head(5)
checkout = df[df['Transaction']=='Check-out']
checkout = checkout.rename(columns={'Check-out':'atime','Amount':'cost','Destination':'arrival','Date':'date'}).reset_index()
checkout = checkout[['date','atime','arrival','cost']]
#checkout.head(5)
cleaned = pd.DataFrame(columns=['date','dtime','departure','atime','arrival','cost','duration','iswork'])
#cleaned
cleaned = pd.concat( [cleaned,checkin] , ignore_index=False, sort=False)
#cleaned.head(5)
cleaned.sort_values(by=['date','dtime'],ascending=[True,True],inplace=True)
checkout.sort_values(by=['date','atime'],ascending=[True,True],inplace=True)
#print(cleaned.head(5))
#print(checkout.head(3))
for i,r in checkout.iterrows():
print(cleaned['date'][i])
print(r['date'])
print(cleaned['dtime'][i])
print(r['atime'])
f = cleaned[(cleaned['date']==r['date']) & (cleaned['dtime']<cleaned['atime'])]
print(f)
break
#cleaned.head(4) Quote:04-05-2019
04-05-2019
09:53
10:02
Empty DataFrame
Columns: [date, dtime, departure, atime, arrival, cost, duration, iswork]
Index: []
I am obviously doing something wrong in the filter on the pandas. but the result is an empty dataframe... :/
Oh, and if it does find one (with no data already in) would it be updatable ?
Or do I need to update and send to another output dataframe ?
(Jun-06-2019, 07:02 PM)chisox721 Wrote: Is your goal to combine the data in the "checkin" and "checkout" frames? If that is the case, first off use the "merge" function - not "concat". Also, when using "merge" you need to specify which column you want to merge on (identify the column with identical values in the "checkin" and "checkout" frame).
If you attach the file you're working with I'll write out the script for you but it would be more helpful long term if you understand the concepts yourself. Hi
I cannot use merged as you require a unique field. I do not have one.
I am trying to get the check out for each check in to produce a trip/journey, duration, cost etc...
But I need the check out that is immediately after the checkin (assuming I have checked in and out properly). There is a small chance I checked in and forgot to check out... Unlikely but possible.
Any ideas on why its not bringing back at least one record when filtered on date and time ?
Posts: 44
Threads: 11
Joined: Mar 2019
Weirdly if I use one or the other to filter the dataframe, it runs. But both together and it errors.
Posts: 44
Threads: 11
Joined: Mar 2019
for i,r in checkout.iterrows():
ind = -1
f = cleaned[cleaned['atime'].isnull()]
print('1',len(f))
#print(f.head(4))
if f is not None:
f = f[f['date']==r['date']]
print('2',len(f))
#print(f.head(4))
if f is not None:
f = f[f['dtime']<r['atime']]
print('3',len(f))
#print(f.head(4))
if f is not None:
ind = iloc(f)
print('4',f.head(4))
if ind>=0:
print(ind)
cleaned[ind]['atime','arrival','cost']=r['atime','arrival','cost']
break Still testing - this almost worked but errors as it cannot find the index of the cleaned to update the values...
Posts: 44
Threads: 11
Joined: Mar 2019
Ok. Managed to get the index back and find the original row via the index. However it is still not doing it (still erroring) so I must still be doing it wrong.... any ideas (I have a headache now)
Final loop is now :
for i,r in checkout.iterrows():
ind = -1
f = cleaned[cleaned['atime'].isnull()]
if f is not None:
f = f[f['date']==r['date']]
if f is not None:
f = f[f['dtime']<r['atime']]
if f is not None:
cleaned.iloc[f.index[0]]['atime','arrival','cost']=r['atime','arrival','cost']
Posts: 44
Threads: 11
Joined: Mar 2019
for i,r in checkout.iterrows():
ind = -1
f = cleaned[cleaned['atime'].isnull()]
if f is not None:
f = f[f['date']==r['date']]
if f is not None:
f = f[f['dtime']<r['atime']]
if (f is not None) & (len(f)>0):
ind = f.index[0]
if ind is not None:
cleaned.iloc[ind][['atime','arrival','cost']] = r[['atime','arrival','cost']]
#lower three lines combined into one above
#cleaned.iloc[ind]['atime'] = r['atime']
#cleaned.iloc[ind]['arrival'] = r['arrival']
#cleaned.iloc[ind]['cost'] = r['cost'] Updated the dataframe - although with iterations...
Posts: 28
Threads: 8
Joined: Oct 2017
(Jun-06-2019, 08:52 PM)Ecniv Wrote: Hi
I cannot use merged as you require a unique field. I do not have one.
I am trying to get the check out for each check in to produce a trip/journey, duration, cost etc...
But I need the check out that is immediately after the checkin (assuming I have checked in and out properly). There is a small chance I checked in and forgot to check out... Unlikely but possible.
Any ideas on why its not bringing back at least one record when filtered on date and time ?
I'm still not really understanding what you're looking to accomplish and your codes are very confusing. One question I have is how do you not have a unique column to merge on? I would think that if you are checking in/checking out of a hotel that would mean you have a corresponding transaction number or something similar that could be used to properly identify the trip. Even a separate column that includes a reference to the specific trip like "SpainVacation" would work. That is unless I'm completely wrong in my understanding of what trying to accomplish.
Like I said before it would be much easier for me to help you if I could see the contents of the file you're working with - or at least a file with a similar structure.
|