Python Forum
Correctly read a malformed CSV file data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Correctly read a malformed CSV file data
#1
I have a malformed CSV file where I need to create a proper dataframe.
[Image: s!An03iU493hAgnpxZ2rP2oUzB9BJNNw?e=djFMls]

I get an error when reading with pandas
Error:
ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

so I used csv module in order to rectify the error in csv data.

import csv
with open(r'C:\Users\Klllmmm\Downloads\test_CSV_file.csv', 'r') as file:
    csv_file = csv.DictReader(file)
    df = pd.DataFrame(csv_file)
Output:
df.head() Out[75]: SOURCE CATEGORY PERIOD_NAME \ 0 3 7 Sep-22 1 3 7 Sep-22 2 3 7 Apr-22 3 EP A 4926954 1068 EP 21-APR-2022 Receipts None 4 3 7 Apr-22 BATCH_NAME JOURNAL_NAME 0 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts 1 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts 2 ext001 None 3 None None 4 ext001 None
How can I combine the data from next line if the "JOURNAL_NAME" field is None and remove the next line.

This is the expected format of the table.

[Image: s!An03iU493hAgnpxaCh7UB-IHNWpGBQ?e=OLO7l1]

Appreciate it if someone can help on how to rectify the data.

Attached Files

Thumbnail(s)
       

.csv   test_CSV_file.csv (Size: 415 bytes / Downloads: 103)
Reply
#2
read the whole file in memory and replace \n with just " "

import pandas as pd
from io import StringIO
file_path = r'path-to-file\test_CSV_file.csv'

with open(file_path, 'r') as f:
    data = f.read().replace('\n ', ' ')

df = pd.read_csv(StringIO(data))
print(df)
output

Output:
SOURCE CATEGORY PERIOD_NAME BATCH_NAME JOURNAL_NAME 0 3 7 Sep-22 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts 1 3 7 Sep-22 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts 2 3 7 Apr-22 ext001 EP A 4926954 1068 EP 21-APR-2022 Receipts 3 3 7 Apr-22 ext001 EP A 4926954 1068 EP 21-APR-2022 Receipts 4 3 6 Apr-22 ext001 EP A 4926954 1068 EP 21-APR-2022 Deposits 5 3 7 Aug-22 ext002 EP A 4962225 1131 EP 03-AUG-2022 Receipts
If you want you can write data back to file in order to fix it, instead of using io.StringIO to directly work with data in pandas
klllmmm likes this post
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
@buran

This is perfect....
Thank you so much!!


(Jan-25-2023, 02:18 PM)buran Wrote: read the whole file in memory and replace \n with just ""

import pandas as pd
from io import StringIO
file_path = r'path-to-file\test_CSV_file.csv'

with open(file_path, 'r') as f:
    data = f.read().replace('\n ', ' ')

df = pd.read_csv(StringIO(data))
print(df)
output

Output:
SOURCE CATEGORY PERIOD_NAME BATCH_NAME JOURNAL_NAME 0 3 7 Sep-22 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts 1 3 7 Sep-22 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts 2 3 7 Apr-22 ext001 EP A 4926954 1068 EP 21-APR-2022 Receipts 3 3 7 Apr-22 ext001 EP A 4926954 1068 EP 21-APR-2022 Receipts 4 3 6 Apr-22 ext001 EP A 4926954 1068 EP 21-APR-2022 Deposits 5 3 7 Aug-22 ext002 EP A 4962225 1131 EP 03-AUG-2022 Receipts
If you want you can write data back to file in order to fix it, instead of using io.StringIO to directly work with data in pandas
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with to check an Input list data with a data read from an external source sacharyya 3 318 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  Writing a Linear Search algorithm - malformed string representation Drone4four 10 832 Jan-10-2024, 08:39 AM
Last Post: gulshan212
  Recommended way to read/create PDF file? Winfried 3 2,784 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,308 Nov-09-2023, 10:56 AM
Last Post: mg24
  read file txt on my pc to telegram bot api Tupa 0 1,048 Jul-06-2023, 01:52 AM
Last Post: Tupa
  parse/read from file seperated by dots giovanne 5 1,043 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Formatting a date time string read from a csv file DosAtPython 5 1,160 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  How do I read and write a binary file in Python? blackears 6 6,013 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Read csv file with inconsistent delimiter gracenz 2 1,140 Mar-27-2023, 08:59 PM
Last Post: deanhystad
  Read text file, modify it then write back Pavel_47 5 1,500 Feb-18-2023, 02:49 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020