Python Forum
Pandas DataFrame combine rows by column value, where Date Rows are NULL
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas DataFrame combine rows by column value, where Date Rows are NULL
#1
Scenerio: Parse the PDF Bank statement and transform into clean and formatted csv file.

What I've tried: I manage to parse the pdf file(tabular format) using camelot library but failed to produce the desired result in sense of formatting.

Code:

import camelot
import pandas as pd

tables = camelot.read_pdf('test.pdf', pages = '3')

for i, table in enumerate(tables):
    print(f'table_id:{i}')
    print(f'page:{table.page}')
    print(f'coordinates:{table._bbox}')

tables = camelot.read_pdf('test.pdf', flavor='stream', pages = '3')

columns = df.iloc[0]

df.columns = columns
df = df.drop(0)
df.head()

for c in df.select_dtypes('object').columns:
    df[c] = df[c].str.replace('$', '')
    df[c] = df[c].str.replace('-', '')

def convert_to_float(num):
    try:
        return float(num.replace(',',''))
    except:
        return 0

for col in ['Deposits', 'Withdrawals', 'Balance']:
    df[col] = df[col].map(convert_to_float)
My_Result:

https://ibb.co/VYLczdr

Desired_Output:

https://ibb.co/2NZby99


The logic I came up with is to move those rows up i guess n-1 if date column is NaN i don't know if this logic is right or not.Can anyone help me to sort out this properly?

I tried pandas groupby and aggregation functions but it only merging the whole data and removing NaN and duplicate dates which is not suitable because every entry is necessary.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Copy a column from one dataframe to another dataframe Led_Zeppelin 17 615 Jul-08-2022, 08:40 PM
Last Post: deanhystad
  Basic Pandas, obtaining a value from column and row JamesOzone 2 219 Jun-30-2022, 07:16 PM
Last Post: jefsummers
  Trying to delete rows above a specific datetime value cubangt 19 1,330 May-09-2022, 08:57 PM
Last Post: deanhystad
  "Vlookup" in pandas dataframe doug2019 3 569 May-09-2022, 01:35 PM
Last Post: snippsat
  SQLAlchemy Object Missing when Null is returned Personne 1 549 Feb-19-2022, 02:50 AM
Last Post: Larz60+
  Float Slider - Affecting Values in Column 'Pandas' planckepoch86 0 516 Jan-22-2022, 02:18 PM
Last Post: planckepoch86
  Increase the speed of a python loop over a pandas dataframe mcva 0 545 Jan-21-2022, 06:24 PM
Last Post: mcva
  Cannot convert the series to <class 'int'> when trying to create new dataframe column Mark17 3 3,724 Jan-20-2022, 05:15 PM
Last Post: deanhystad
  value null when update in json file 3lnyn0 6 1,038 Dec-30-2021, 05:52 PM
Last Post: ndc85430
  [Answered] Retrieve a set of rows from text file knob 4 871 Dec-22-2021, 07:45 PM
Last Post: knob

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020