Python Forum
CSV file with irregular structure
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CSV file with irregular structure
#6
If you only need to "compress" your file to one tweet per line, then you can "define" tweet as anything starting with row with timestamp followed by | and ending when another tweet starts (or file ends). You can iterate over lines from your file and for each line check whether it starts with timestamp followed by |  and either start new tweet, or append it to actual tweet ...

import re
pattern = re.compile(r"^\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\.\d{6}\|")  # timestamp| 

with open('tweets_sample.csv') as infile, open('tweets_compress.psv', 'w') as outfile:
    outfile.write(next(infile).strip())  # to start first tweet
    for line in infile:        
        if pattern.match(line):  # new tweet starts, previous one ends ...
            outfile.write('\n')
        outfile.write(line.strip())
There is possibility that it would split tweet message (if body of tweet contains new line followed by timestamp|).
Reply


Messages In This Thread
CSV file with irregular structure - by ulrich48155 - May-16-2017, 10:40 AM
RE: CSV file with irregular structure - by Larz60+ - May-16-2017, 10:57 AM
RE: CSV file with irregular structure - by Ofnuts - May-16-2017, 12:17 PM
RE: CSV file with irregular structure - by Ofnuts - May-16-2017, 07:46 PM
RE: CSV file with irregular structure - by zivoni - May-16-2017, 11:03 PM
RE: CSV file with irregular structure - by zivoni - May-20-2017, 09:34 AM
RE: CSV file with irregular structure - by zivoni - May-22-2017, 07:35 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020