Python Forum
Pandas DataFrame combine rows by column value, where Date Rows are NULL
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas DataFrame combine rows by column value, where Date Rows are NULL
#1
Scenerio: Parse the PDF Bank statement and transform into clean and formatted csv file.

What I've tried: I manage to parse the pdf file(tabular format) using camelot library but failed to produce the desired result in sense of formatting.

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import camelot
import pandas as pd
 
tables = camelot.read_pdf('test.pdf', pages = '3')
 
for i, table in enumerate(tables):
    print(f'table_id:{i}')
    print(f'page:{table.page}')
    print(f'coordinates:{table._bbox}')
 
tables = camelot.read_pdf('test.pdf', flavor='stream', pages = '3')
 
columns = df.iloc[0]
 
df.columns = columns
df = df.drop(0)
df.head()
 
for c in df.select_dtypes('object').columns:
    df[c] = df[c].str.replace('$', '')
    df[c] = df[c].str.replace('-', '')
 
def convert_to_float(num):
    try:
        return float(num.replace(',',''))
    except:
        return 0
 
for col in ['Deposits', 'Withdrawals', 'Balance']:
    df[col] = df[col].map(convert_to_float)
My_Result:

https://ibb.co/VYLczdr

Desired_Output:

https://ibb.co/2NZby99


The logic I came up with is to move those rows up i guess n-1 if date column is NaN i don't know if this logic is right or not.Can anyone help me to sort out this properly?

I tried pandas groupby and aggregation functions but it only merging the whole data and removing NaN and duplicate dates which is not suitable because every entry is necessary.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  renaming a column without a name in a dataframe Carbonpony 2 694 Jan-23-2025, 08:20 AM
Last Post: Carbonpony
  Most efficient way to roll through a pandas dataframe? sawtooth500 2 1,046 Aug-28-2024, 10:08 AM
Last Post: Alice12
  docx file to pandas dataframe/excel iitip92 1 2,223 Jun-27-2024, 05:28 AM
Last Post: Pedroski55
  Compare current date on calendar with date format file name Fioravanti 1 1,824 Mar-26-2024, 08:23 AM
Last Post: Pedroski55
  Adding PD DataFrame column bsben 2 1,257 Mar-08-2024, 10:46 PM
Last Post: deanhystad
  Python date format changes to date & time 1418 4 2,460 Jan-20-2024, 04:45 AM
Last Post: 1418
  This result object does not return rows. It has been closed automatically dawid294 5 5,032 Jan-10-2024, 10:55 PM
Last Post: deanhystad
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 1,324 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  How is pandas modifying all rows in an assignment - python-newbie question markm74 1 1,383 Nov-28-2023, 10:36 PM
Last Post: deanhystad
  How to insert Dashed Lines in between Rows of a tabulate output Mudassir1987 0 1,100 Sep-27-2023, 10:09 AM
Last Post: Mudassir1987

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020