Python Forum
Pandas DataFrame combine rows by column value, where Date Rows are NULL
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas DataFrame combine rows by column value, where Date Rows are NULL
#1
Scenerio: Parse the PDF Bank statement and transform into clean and formatted csv file.

What I've tried: I manage to parse the pdf file(tabular format) using camelot library but failed to produce the desired result in sense of formatting.

Code:

import camelot
import pandas as pd

tables = camelot.read_pdf('test.pdf', pages = '3')

for i, table in enumerate(tables):
    print(f'table_id:{i}')
    print(f'page:{table.page}')
    print(f'coordinates:{table._bbox}')

tables = camelot.read_pdf('test.pdf', flavor='stream', pages = '3')

columns = df.iloc[0]

df.columns = columns
df = df.drop(0)
df.head()

for c in df.select_dtypes('object').columns:
    df[c] = df[c].str.replace('$', '')
    df[c] = df[c].str.replace('-', '')

def convert_to_float(num):
    try:
        return float(num.replace(',',''))
    except:
        return 0

for col in ['Deposits', 'Withdrawals', 'Balance']:
    df[col] = df[col].map(convert_to_float)
My_Result:

https://ibb.co/VYLczdr

Desired_Output:

https://ibb.co/2NZby99


The logic I came up with is to move those rows up i guess n-1 if date column is NaN i don't know if this logic is right or not.Can anyone help me to sort out this properly?

I tried pandas groupby and aggregation functions but it only merging the whole data and removing NaN and duplicate dates which is not suitable because every entry is necessary.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Filter dataframe by datetime.date column glidecode 2 91 Yesterday, 12:51 AM
Last Post: glidecode
  for loop in dataframe in pandas Paulman 7 205 Dec-02-2021, 12:15 AM
Last Post: bowlofred
  Calculate next rows based on previous values of array divon 0 152 Nov-23-2021, 04:44 AM
Last Post: divon
  Dynamically Add rows to table TommyAutomagically 1 241 Nov-04-2021, 10:59 PM
Last Post: TommyAutomagically
  making variables in my columns and rows in python kronhamilton 2 275 Oct-31-2021, 10:38 AM
Last Post: snippsat
  pandas pivot table: How to find count for each group in Index and Column JaneTan 0 275 Oct-23-2021, 04:35 AM
Last Post: JaneTan
  Date format and past date check function Turtle 5 470 Oct-22-2021, 09:45 PM
Last Post: deanhystad
  rows from sql query need to write to a file as columns sjcsvatt 6 417 Oct-09-2021, 12:45 AM
Last Post: snippsat
  Slittping table into Multiple tables by rows drunkenneo 1 338 Oct-06-2021, 03:17 PM
Last Post: snippsat
  Structuring and pivoting corrupted dataframe in pandas gunner1905 2 406 Sep-18-2021, 01:30 PM
Last Post: gunner1905

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020