Apr-04-2017, 11:44 PM
As Ofnuts suggested, you should check downloaded file. There is a xml file with meta informations on archive.org page with SHA1, so you can compare it (and you can compare it with file downloaded to your pc).
I am not sure if tweet export must start with "created_at" (it could depend on software used for export, on python json's are dict and they dont keep order (except 3.6)). Actually I dont think that files on ec2 are corrupt (bzip2 should report it/crash when you try to extract corrupted archive), likely just different version ...
If in that file keys have different names, or some are missing (i dont know if lang was used 5 years ago), then parsing it would raise error in innermost loop and nothing would be written.
I am not sure if tweet export must start with "created_at" (it could depend on software used for export, on python json's are dict and they dont keep order (except 3.6)). Actually I dont think that files on ec2 are corrupt (bzip2 should report it/crash when you try to extract corrupted archive), likely just different version ...
If in that file keys have different names, or some are missing (i dont know if lang was used 5 years ago), then parsing it would raise error in innermost loop and nothing would be written.