May-16-2017, 10:40 AM
I got a CSV file with about 500.000 lines of twitter tweets. The file is structured as follows:
Timestamp1 | Topic | Timestamp2 | User | Message | RetweetCount | Location
The problem I have is that especially the messages sometimes are not in one row but in two or even three rows. Furthermore, sometimes there is even a blank row between the parts of one message.
Here is an example:
http://imgur.com/a/R0pbz (the upper row is good, the one under it isnt)
I also uploaded a sample file:
https://mega.nz/#!rZ5GwRrC!yKsHs26ZZZtXE...mIfIbp-fqs
How can I fix this?
I thought maybe there is a way to tell python to compromize the data into one row.
Thanks in advance!
Timestamp1 | Topic | Timestamp2 | User | Message | RetweetCount | Location
The problem I have is that especially the messages sometimes are not in one row but in two or even three rows. Furthermore, sometimes there is even a blank row between the parts of one message.
Here is an example:
http://imgur.com/a/R0pbz (the upper row is good, the one under it isnt)
I also uploaded a sample file:
https://mega.nz/#!rZ5GwRrC!yKsHs26ZZZtXE...mIfIbp-fqs
How can I fix this?
I thought maybe there is a way to tell python to compromize the data into one row.
Thanks in advance!