Python Forum
Pandas DataFrame combine rows by column value, where Date Rows are NULL
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas DataFrame combine rows by column value, where Date Rows are NULL
#1
Scenerio: Parse the PDF Bank statement and transform into clean and formatted csv file.

What I've tried: I manage to parse the pdf file(tabular format) using camelot library but failed to produce the desired result in sense of formatting.

Code:

import camelot
import pandas as pd

tables = camelot.read_pdf('test.pdf', pages = '3')

for i, table in enumerate(tables):
    print(f'table_id:{i}')
    print(f'page:{table.page}')
    print(f'coordinates:{table._bbox}')

tables = camelot.read_pdf('test.pdf', flavor='stream', pages = '3')

columns = df.iloc[0]

df.columns = columns
df = df.drop(0)
df.head()

for c in df.select_dtypes('object').columns:
    df[c] = df[c].str.replace('$', '')
    df[c] = df[c].str.replace('-', '')

def convert_to_float(num):
    try:
        return float(num.replace(',',''))
    except:
        return 0

for col in ['Deposits', 'Withdrawals', 'Balance']:
    df[col] = df[col].map(convert_to_float)
My_Result:

https://ibb.co/VYLczdr

Desired_Output:

https://ibb.co/2NZby99


The logic I came up with is to move those rows up i guess n-1 if date column is NaN i don't know if this logic is right or not.Can anyone help me to sort out this properly?

I tried pandas groupby and aggregation functions but it only merging the whole data and removing NaN and duplicate dates which is not suitable because every entry is necessary.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Structuring and pivoting corrupted dataframe in pandas gunner1905 2 149 Sep-18-2021, 01:30 PM
Last Post: gunner1905
  TypeError: 'DataFrame' object is not callable using Pandas in Python sofiavlachou 1 164 Sep-02-2021, 03:24 PM
Last Post: buran
  Problem in saving .xlsm (excel) file using pandas dataframe in python shantanu97 2 243 Aug-29-2021, 12:39 PM
Last Post: snippsat
  How to combine multiple rows of strings into one using pandas? shantanu97 1 176 Aug-22-2021, 05:26 AM
Last Post: klllmmm
  Merging spreadsheets with the same columns and extracting rows with matching entries johnbernard 3 307 Aug-19-2021, 03:08 PM
Last Post: johnbernard
  Summing up rows and columns plumberpy 3 352 Aug-18-2021, 05:46 AM
Last Post: naughtyCat
  Iterating Through Data Frame Rows JoeDainton123 3 1,020 Aug-09-2021, 07:01 AM
Last Post: Pedroski55
  openpyxl incorrect delete rows VladislavM 6 647 Jul-19-2021, 08:54 AM
Last Post: VladislavM
  Python Pandas: How do I extract all the >1000 data from a certain column? JaneTan 0 294 Jul-17-2021, 09:09 AM
Last Post: JaneTan
  Python Pandas: How do I sumproduct by rows with an if condition? JaneTan 2 511 Jul-13-2021, 11:36 AM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020