Pandas DataFrame combine rows by column value, where Date Rows are NULL

rhat398 · (This post was last modified: May-04-2021, 10:51 PM by rhat398.)

Scenerio: Parse the PDF Bank statement and transform into clean and formatted csv file.

What I've tried: I manage to parse the pdf file(tabular format) using camelot library but failed to produce the desired result in sense of formatting.

Code:

        
              import camelot
import pandas as pd
 
tables = camelot.read_pdf('test.pdf', pages = '3')
 
for i, table in enumerate(tables):
    print(f'table_id:{i}')
    print(f'page:{table.page}')
    print(f'coordinates:{table._bbox}')
 
tables = camelot.read_pdf('test.pdf', flavor='stream', pages = '3')
 
columns = df.iloc[0]
 
df.columns = columns
df = df.drop(0)
df.head()
 
for c in df.select_dtypes('object').columns:
    df[c] = df[c].str.replace('$', '')
    df[c] = df[c].str.replace('-', '')
 
def convert_to_float(num):
    try:
        return float(num.replace(',',''))
    except:
        return 0
 
for col in ['Deposits', 'Withdrawals', 'Balance']:
    df[col] = df[col].map(convert_to_float)

My_Result:

https://ibb.co/VYLczdr

Desired_Output:

https://ibb.co/2NZby99

The logic I came up with is to move those rows up i guess n-1 if date column is NaN i don't know if this logic is right or not.Can anyone help me to sort out this properly?

I tried pandas groupby and aggregation functions but it only merging the whole data and removing NaN and duplicate dates which is not suitable because every entry is necessary.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	renaming a column without a name in a dataframe	Carbonpony	2	694	Jan-23-2025, 08:20 AM Last Post: Carbonpony
	Most efficient way to roll through a pandas dataframe?	sawtooth500	2	1,046	Aug-28-2024, 10:08 AM Last Post: Alice12
	docx file to pandas dataframe/excel	iitip92	1	2,223	Jun-27-2024, 05:28 AM Last Post: Pedroski55
	Compare current date on calendar with date format file name	Fioravanti	1	1,824	Mar-26-2024, 08:23 AM Last Post: Pedroski55
	Adding PD DataFrame column	bsben	2	1,257	Mar-08-2024, 10:46 PM Last Post: deanhystad
	Python date format changes to date & time	1418	4	2,460	Jan-20-2024, 04:45 AM Last Post: 1418
	This result object does not return rows. It has been closed automatically	dawid294	5	5,032	Jan-10-2024, 10:55 PM Last Post: deanhystad
	Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition	sanky1990	0	1,324	Dec-04-2023, 09:48 PM Last Post: sanky1990
	How is pandas modifying all rows in an assignment - python-newbie question	markm74	1	1,383	Nov-28-2023, 10:36 PM Last Post: deanhystad
	How to insert Dashed Lines in between Rows of a tabulate output	Mudassir1987	0	1,100	Sep-27-2023, 10:09 AM Last Post: Mudassir1987

Pandas DataFrame combine rows by column value, where Date Rows are NULL

User Panel Messages

Announcements