Python Forum
CSV file with irregular structure
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CSV file with irregular structure
#10
(May-22-2017, 06:11 PM)ulrich48155 Wrote: It seems like every tweet which were seperated by blank lines is now seperate by blank rows.

I am afraid that I have no idea what it does mean (and what is difference between blank line and blank row?).

Yes, it seems that your data are "dirty" and pipe ("|") beside being field separator is used in body of tweets. If (and only if) you know that pipe could be used only inside of body of tweet (fifth field?) and correct tweet should have exactly six pipes, then you can try to preprocess it and keep first four pipes and last two pipes, while replacing all other ones with some character of your choice - that way your lines will have correct number of separators and there wont be any pipes in the text of tweet. You can do such replacing with something like
splits = line.split("|")
new_line = "|".join(splits[:4] + ["+".join(splits[4:-2])] + splits[-2:])  # replaces offending pipes with +
used on lines in your "compressed" file.
Reply


Messages In This Thread
CSV file with irregular structure - by ulrich48155 - May-16-2017, 10:40 AM
RE: CSV file with irregular structure - by Larz60+ - May-16-2017, 10:57 AM
RE: CSV file with irregular structure - by Ofnuts - May-16-2017, 12:17 PM
RE: CSV file with irregular structure - by Ofnuts - May-16-2017, 07:46 PM
RE: CSV file with irregular structure - by zivoni - May-16-2017, 11:03 PM
RE: CSV file with irregular structure - by zivoni - May-20-2017, 09:34 AM
RE: CSV file with irregular structure - by zivoni - May-22-2017, 07:35 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020