(May-17-2017, 06:31 PM)ulrich48155 Wrote: Do you think it would even be possible to really connect the parts that were splitted over different rows? Now, when opening the compressed file with excel, some of the rows are seperated with blank columns. In order to do the sentiment analysis this still would require a great deal of manual work.
Do you mean two tweets separated by empty space in one line? That code writes new line only before row starting with timestamp|, so if there are tweets starting say with <space>timestamp|, they will be concatenated to previous one. This could be solved by changing pattern to match such lines (while increasing risk of breaking tweet in its message). Please post small sample of your input data that leads to "empty columns".