If you only need to "compress" your file to one tweet per line, then you can "define" tweet as anything starting with row with timestamp followed by | and ending when another tweet starts (or file ends). You can iterate over lines from your file and for each line check whether it starts with timestamp followed by | and either start new tweet, or append it to actual tweet ...
import re pattern = re.compile(r"^\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\.\d{6}\|") # timestamp| with open('tweets_sample.csv') as infile, open('tweets_compress.psv', 'w') as outfile: outfile.write(next(infile).strip()) # to start first tweet for line in infile: if pattern.match(line): # new tweet starts, previous one ends ... outfile.write('\n') outfile.write(line.strip())There is possibility that it would split tweet message (if body of tweet contains new line followed by timestamp|).