I have a malformed CSV file where I need to create a proper dataframe.
![[Image: s!An03iU493hAgnpxZ2rP2oUzB9BJNNw?e=djFMls]](https://1drv.ms/u/s!An03iU493hAgnpxZ2rP2oUzB9BJNNw?e=djFMls)
I get an error when reading with pandas
so I used csv module in order to rectify the error in csv data.
This is the expected format of the table.
![[Image: s!An03iU493hAgnpxaCh7UB-IHNWpGBQ?e=OLO7l1]](https://1drv.ms/u/s!An03iU493hAgnpxaCh7UB-IHNWpGBQ?e=OLO7l1)
Appreciate it if someone can help on how to rectify the data.
I get an error when reading with pandas
Error:ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
so I used csv module in order to rectify the error in csv data.
import csv with open(r'C:\Users\Klllmmm\Downloads\test_CSV_file.csv', 'r') as file: csv_file = csv.DictReader(file) df = pd.DataFrame(csv_file)
Output:df.head()
Out[75]:
SOURCE CATEGORY PERIOD_NAME \
0 3 7 Sep-22
1 3 7 Sep-22
2 3 7 Apr-22
3 EP A 4926954 1068 EP 21-APR-2022 Receipts None
4 3 7 Apr-22
BATCH_NAME JOURNAL_NAME
0 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts
1 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts
2 ext001 None
3 None None
4 ext001 None
How can I combine the data from next line if the "JOURNAL_NAME" field is None and remove the next line.This is the expected format of the table.
Appreciate it if someone can help on how to rectify the data.
Attached Files