Python Forum
Read csv file with inconsistent delimiter
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read csv file with inconsistent delimiter
#1
Hi everyone.

Can I seek your help on properly reading this csv file with incomplete data (e.g. row 14 to 23) and different delimiters?
row 23 has been incorrectly captured as far as it looks.
The data contains 8 columns i.e. Release_Date, Title, Overview, Popularity, Vote_Count, Vote_Average, Original_Language, Genre.

Thanking you in advance.

Attached Files

.csv   testing.csv (Size: 5.94 KB / Downloads: 139)
Reply
#2
Just skip the rows, which start with a -.

import csv
from datetime import datetime as DateTime


def read_broken_csv(file):
    with open(file, newline="") as fd:
        reader = csv.reader(fd)
        header = next(reader)
        for row in reader:
            # skipping everything where row[0] starts with a `-`
            if row[0].lstrip().startswith("-"):
                continue
            # conversion to date may fail, if the data is in the wrong format
            try:
                row[0] = DateTime.strptime(row[0], "%m/%d/%Y").date()
                row[-1] = tuple(map(str.strip, row[-1].split(",")))
            except ValueError:
                # skipping row, if the format of date was not ok
                continue

            yield row


for row in read_broken_csv("Downloads/testing.csv"):
    print(row)
gracenz likes this post
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
To fix you need to identify when a linefeed is not the end of a row, but a continuation of the description. You could look " - " at the start of the "Release_Date" to find the continuations. If the Overivew is "NaN" append the description to the previous description. If the Overview is a number, use the Overview, Popularity, Vote_Count, Vote_Average, and Original_Language as the Popularity, Vote_Count, Vote_Average, Original_Language and Genre for the row. Even then you still have that odd link after genre. Quite complicated.

Or you could just fix whatever is generating these files and have it place quotes around the Title and Overview. I edited your csv file and wrapped all the titles and Overviews in quotes. I left the linefeeds in the pixie bakeoff description and deleted the link after the pixie bakeoff genre (Animaged) as well as all the commas that were added to the csv file because you did not have quotes on the Overview. The modified csv file reads fine.
gracenz likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Recommended way to read/create PDF file? Winfried 3 2,901 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,473 Nov-09-2023, 10:56 AM
Last Post: mg24
  read file txt on my pc to telegram bot api Tupa 0 1,129 Jul-06-2023, 01:52 AM
Last Post: Tupa
  parse/read from file seperated by dots giovanne 5 1,126 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Formatting a date time string read from a csv file DosAtPython 5 1,300 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  How do I read and write a binary file in Python? blackears 6 6,703 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Context-sensitive delimiter ZZTurn 9 1,514 May-16-2023, 07:31 AM
Last Post: Gribouillis
  Read text file, modify it then write back Pavel_47 5 1,631 Feb-18-2023, 02:49 PM
Last Post: deanhystad
  Correctly read a malformed CSV file data klllmmm 2 1,972 Jan-25-2023, 04:12 PM
Last Post: klllmmm
  How to read csv file update matplotlib column chart regularly SamLiu 2 1,073 Jan-21-2023, 11:33 PM
Last Post: SamLiu

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020