Python Forum
Pandas DataFrame combine rows by column value, where Date Rows are NULL
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas DataFrame combine rows by column value, where Date Rows are NULL
#1
Scenerio: Parse the PDF Bank statement and transform into clean and formatted csv file.

What I've tried: I manage to parse the pdf file(tabular format) using camelot library but failed to produce the desired result in sense of formatting.

Code:

import camelot
import pandas as pd

tables = camelot.read_pdf('test.pdf', pages = '3')

for i, table in enumerate(tables):
    print(f'table_id:{i}')
    print(f'page:{table.page}')
    print(f'coordinates:{table._bbox}')

tables = camelot.read_pdf('test.pdf', flavor='stream', pages = '3')

columns = df.iloc[0]

df.columns = columns
df = df.drop(0)
df.head()

for c in df.select_dtypes('object').columns:
    df[c] = df[c].str.replace('$', '')
    df[c] = df[c].str.replace('-', '')

def convert_to_float(num):
    try:
        return float(num.replace(',',''))
    except:
        return 0

for col in ['Deposits', 'Withdrawals', 'Balance']:
    df[col] = df[col].map(convert_to_float)
My_Result:

https://ibb.co/VYLczdr

Desired_Output:

https://ibb.co/2NZby99


The logic I came up with is to move those rows up i guess n-1 if date column is NaN i don't know if this logic is right or not.Can anyone help me to sort out this properly?

I tried pandas groupby and aggregation functions but it only merging the whole data and removing NaN and duplicate dates which is not suitable because every entry is necessary.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas dictionary dataframe help michaelserra 3 61 3 hours ago
Last Post: snippsat
  How to get value in Dataframe given row & column values? moonlight 1 252 Apr-26-2021, 09:30 PM
Last Post: Larz60+
Star Split and organize my Pandas Dataframe brunolelli 4 399 Apr-18-2021, 03:00 AM
Last Post: brunolelli
  Convert MultiLayer XML to DataFrame using Pandas vsingh17 0 401 Apr-14-2021, 03:50 PM
Last Post: vsingh17
  Indexing [::-1] to Reverse ALL 2D Array Rows, ALL 3D, 4D Array Columns & Rows Python Jeremy7 8 1,125 Mar-02-2021, 01:54 AM
Last Post: Jeremy7
Question Dataframe Manipulation Coping Rows and Removing Dates ashleysnl 1 310 Feb-26-2021, 10:00 PM
Last Post: nilamo
  Pandas: how to split one row of data to multiple rows and columns in Python GerardMoussendo 4 891 Feb-22-2021, 06:51 PM
Last Post: eddywinch82
  Pandas DataFrame Code Query eddywinch82 6 872 Feb-12-2021, 09:55 PM
Last Post: eddywinch82
  How to add previous date infront of every unique customer id's invoice date ur_enegmatic 1 379 Feb-06-2021, 10:48 PM
Last Post: eddywinch82
  How to filter out Column data From Multiple rows data? firaki12345 10 925 Feb-06-2021, 04:54 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020