Jan-05-2021, 10:33 PM
Hello everyone,
I have setup dataframes before but I always struggle at the early stages. There is some key piece of information that I am missing and I am hoping someone here can point out what I always do wrong.
I have imported data (from my brokerage...my stocks transactions from last 8 years). There are 15 columns and 1312 rows.
At this point, I am just playing around with the data. I am trying to use the iloc function to pull some numbers but it isn't working. Does anyone know why the iloc function wouldn't work.
I am using the PROCESS_DATE to find the next column figure.
Is it cause I have wrong variables (all columns should be string). Is it cause I have not designated a single column as index.
thanks,
YK!
here is another piece of information
I have setup dataframes before but I always struggle at the early stages. There is some key piece of information that I am missing and I am hoping someone here can point out what I always do wrong.
I have imported data (from my brokerage...my stocks transactions from last 8 years). There are 15 columns and 1312 rows.
At this point, I am just playing around with the data. I am trying to use the iloc function to pull some numbers but it isn't working. Does anyone know why the iloc function wouldn't work.
I am using the PROCESS_DATE to find the next column figure.
Is it cause I have wrong variables (all columns should be string). Is it cause I have not designated a single column as index.
#Section 1 -------------------Data Cleanup------------------------------------- #Import Libraries import pandas as pd import numpy as np df = pd.read_csv('3800131223.csv', sep=',', names=['PROCESS_DATE', 'EFFECTIVE_DATE', 'ACCOUNT_NUMBER','TRANSACTION_ID','SUB_TRANSACTION_ID', 'SYMBOL', 'TRANSACTION_DESC','TRANSACTION_TYPE', 'QUANTITY', 'COMMISSION', 'PRICE', 'AMOUNT_VALUE', 'SECURITY_TYPE', 'CURRENCY', 'REP'], skiprows=1, header=None) df.head(5)
Output:PROCESS_DATE EFFECTIVE_DATE ACCOUNT_NUMBER TRANSACTION_ID SUB_TRANSACTION_ID SYMBOL TRANSACTION_DESC TRANSACTION_TYPE QUANTITY COMMISSION PRICE AMOUNT_VALUE SECURITY_TYPE CURRENCY REP
0 20201228 20201230 380013122 45925736.0 NaN BABA ALIBABA GROUP HOLDING LTD - AVG. PRICE DETAILS... BUY 20 6.49 222.15 -4449.49 EQ USD NaN
1 20201228 20201230 380013122 45928308.0 NaN SPOT SPOTIFY TECHNOLOGY S A - AVG. PRICE DETAILS ON... BUY 20 6.49 318.00 -6366.49 EQ USD NaN
2 20201224 20201224 380013122 NaN 5981053.0 SU DIV - SUNCOR ENERGY INC DIV 100 0.00 0.00 21.00 NaN CAD NaN
3 20201221 20201221 380013122 NaN 5975559.0 MFC DIV - MANULIFE FINANCIAL CORP DIV 502 0.00 0.00 140.56 NaN CAD NaN
4 20201211 20201201 380013122 21844836.0 0.0 NaN Interest 202011 INT 0 0.00 0.00 -319.52 NaN USD NaN
df.loc['20201224'][1]
Error:---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-66-2ff131d1c895> in <module>
----> 1 df.loc['20201224'][1]
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in __getitem__(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1108 # fall thru to straight lookup
1109 self._validate_key(key, axis)
-> 1110 return self._get_label(key, axis=axis)
1111
1112 def _get_slice_axis(self, slice_obj: slice, axis: int):
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
1057 def _get_label(self, label, axis: int):
1058 # GH#5667 this will fail if the label is not present in the axis.
-> 1059 return self.obj.xs(label, axis=axis)
1060
1061 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
3489 loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
3490 else:
-> 3491 loc = self.index.get_loc(key)
3492
3493 if isinstance(loc, np.ndarray):
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
356 except ValueError as err:
357 raise KeyError(key) from err
--> 358 raise KeyError(key)
359 return super().get_loc(key, method=method, tolerance=tolerance)
360
KeyError: '20201224'
Any thoughts. Does anyone know what I am doing wrong. thanks,
YK!
here is another piece of information
Output:<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1312 entries, 0 to 1311
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PROCESS_DATE 1312 non-null int64
1 EFFECTIVE_DATE 1312 non-null int64
2 ACCOUNT_NUMBER 1312 non-null int64
3 TRANSACTION_ID 1166 non-null float64
4 SUB_TRANSACTION_ID 403 non-null float64
5 SYMBOL 970 non-null object
6 TRANSACTION_DESC 1312 non-null object
7 TRANSACTION_TYPE 1312 non-null object
8 QUANTITY 1312 non-null int64
9 COMMISSION 1312 non-null float64
10 PRICE 1312 non-null float64
11 AMOUNT_VALUE 1312 non-null float64
12 SECURITY_TYPE 909 non-null object
13 CURRENCY 1312 non-null object
14 REP 0 non-null float64
dtypes: float64(6), int64(4), object(5)
memory usage: 153.9+ KB