Correctly read a malformed CSV file data - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Correctly read a malformed CSV file data (/thread-39285.html) |
Correctly read a malformed CSV file data - klllmmm - Jan-25-2023 I have a malformed CSV file where I need to create a proper dataframe. I get an error when reading with pandas so I used csv module in order to rectify the error in csv data. import csv with open(r'C:\Users\Klllmmm\Downloads\test_CSV_file.csv', 'r') as file: csv_file = csv.DictReader(file) df = pd.DataFrame(csv_file) How can I combine the data from next line if the "JOURNAL_NAME" field is None and remove the next line.This is the expected format of the table. Appreciate it if someone can help on how to rectify the data. RE: Correctly read a malformed CSV file data - buran - Jan-25-2023 read the whole file in memory and replace \n with just " " import pandas as pd from io import StringIO file_path = r'path-to-file\test_CSV_file.csv' with open(file_path, 'r') as f: data = f.read().replace('\n ', ' ') df = pd.read_csv(StringIO(data)) print(df)output If you want you can write data back to file in order to fix it, instead of using io.StringIO to directly work with data in pandas
RE: Correctly read a malformed CSV file data - klllmmm - Jan-25-2023 @buran This is perfect.... Thank you so much!! (Jan-25-2023, 02:18 PM)buran Wrote: read the whole file in memory and replace |