I have a malformed CSV file where I need to create a proper dataframe.
I get an error when reading with pandas
so I used csv module in order to rectify the error in csv data.
This is the expected format of the table.
Appreciate it if someone can help on how to rectify the data.
I get an error when reading with pandas
Error:ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
so I used csv module in order to rectify the error in csv data.
import csv with open(r'C:\Users\Klllmmm\Downloads\test_CSV_file.csv', 'r') as file: csv_file = csv.DictReader(file) df = pd.DataFrame(csv_file)
Output:df.head()
Out[75]:
SOURCE CATEGORY PERIOD_NAME \
0 3 7 Sep-22
1 3 7 Sep-22
2 3 7 Apr-22
3 EP A 4926954 1068 EP 21-APR-2022 Receipts None
4 3 7 Apr-22
BATCH_NAME JOURNAL_NAME
0 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts
1 ext002 EP A 5005720 1167 EP 23-SEP-2022 Receipts
2 ext001 None
3 None None
4 ext001 None
How can I combine the data from next line if the "JOURNAL_NAME" field is None and remove the next line.This is the expected format of the table.
Appreciate it if someone can help on how to rectify the data.
Attached Files