Python Forum

Full Version: Help setting up dataframes
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone,


I have setup dataframes before but I always struggle at the early stages. There is some key piece of information that I am missing and I am hoping someone here can point out what I always do wrong.

I have imported data (from my brokerage...my stocks transactions from last 8 years). There are 15 columns and 1312 rows.

At this point, I am just playing around with the data. I am trying to use the iloc function to pull some numbers but it isn't working. Does anyone know why the iloc function wouldn't work.
I am using the PROCESS_DATE to find the next column figure.

Is it cause I have wrong variables (all columns should be string). Is it cause I have not designated a single column as index.


#Section 1 -------------------Data Cleanup-------------------------------------

#Import Libraries
import pandas as pd
import numpy as np

df = pd.read_csv('3800131223.csv', sep=',', names=['PROCESS_DATE', 'EFFECTIVE_DATE', 'ACCOUNT_NUMBER','TRANSACTION_ID','SUB_TRANSACTION_ID', 'SYMBOL', 'TRANSACTION_DESC','TRANSACTION_TYPE', 
                                                 'QUANTITY', 'COMMISSION', 'PRICE', 'AMOUNT_VALUE', 'SECURITY_TYPE', 'CURRENCY', 'REP'], skiprows=1, header=None)

df.head(5)
Output:
PROCESS_DATE EFFECTIVE_DATE ACCOUNT_NUMBER TRANSACTION_ID SUB_TRANSACTION_ID SYMBOL TRANSACTION_DESC TRANSACTION_TYPE QUANTITY COMMISSION PRICE AMOUNT_VALUE SECURITY_TYPE CURRENCY REP 0 20201228 20201230 380013122 45925736.0 NaN BABA ALIBABA GROUP HOLDING LTD - AVG. PRICE DETAILS... BUY 20 6.49 222.15 -4449.49 EQ USD NaN 1 20201228 20201230 380013122 45928308.0 NaN SPOT SPOTIFY TECHNOLOGY S A - AVG. PRICE DETAILS ON... BUY 20 6.49 318.00 -6366.49 EQ USD NaN 2 20201224 20201224 380013122 NaN 5981053.0 SU DIV - SUNCOR ENERGY INC DIV 100 0.00 0.00 21.00 NaN CAD NaN 3 20201221 20201221 380013122 NaN 5975559.0 MFC DIV - MANULIFE FINANCIAL CORP DIV 502 0.00 0.00 140.56 NaN CAD NaN 4 20201211 20201201 380013122 21844836.0 0.0 NaN Interest 202011 INT 0 0.00 0.00 -319.52 NaN USD NaN

df.loc['20201224'][1]
Error:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-66-2ff131d1c895> in <module> ----> 1 df.loc['20201224'][1] ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key) 877 878 maybe_callable = com.apply_if_callable(key, self.obj) --> 879 return self._getitem_axis(maybe_callable, axis=axis) 880 881 def _is_scalar_access(self, key: Tuple): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1108 # fall thru to straight lookup 1109 self._validate_key(key, axis) -> 1110 return self._get_label(key, axis=axis) 1111 1112 def _get_slice_axis(self, slice_obj: slice, axis: int): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis) 1057 def _get_label(self, label, axis: int): 1058 # GH#5667 this will fail if the label is not present in the axis. -> 1059 return self.obj.xs(label, axis=axis) 1060 1061 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level) 3489 loc, new_index = self.index.get_loc_level(key, drop_level=drop_level) 3490 else: -> 3491 loc = self.index.get_loc(key) 3492 3493 if isinstance(loc, np.ndarray): ~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance) 356 except ValueError as err: 357 raise KeyError(key) from err --> 358 raise KeyError(key) 359 return super().get_loc(key, method=method, tolerance=tolerance) 360 KeyError: '20201224'
Any thoughts. Does anyone know what I am doing wrong.

thanks,

YK!

here is another piece of information
Output:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1312 entries, 0 to 1311 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PROCESS_DATE 1312 non-null int64 1 EFFECTIVE_DATE 1312 non-null int64 2 ACCOUNT_NUMBER 1312 non-null int64 3 TRANSACTION_ID 1166 non-null float64 4 SUB_TRANSACTION_ID 403 non-null float64 5 SYMBOL 970 non-null object 6 TRANSACTION_DESC 1312 non-null object 7 TRANSACTION_TYPE 1312 non-null object 8 QUANTITY 1312 non-null int64 9 COMMISSION 1312 non-null float64 10 PRICE 1312 non-null float64 11 AMOUNT_VALUE 1312 non-null float64 12 SECURITY_TYPE 909 non-null object 13 CURRENCY 1312 non-null object 14 REP 0 non-null float64 dtypes: float64(6), int64(4), object(5) memory usage: 153.9+ KB