Apr-17-2017, 07:01 PM
When I think about it there could be problem with np.float applied on pandas serie, maybe following code would work:
df['long'] = df.geo.str.replace(pattern, r"\1").astype('float')Regardless of it, its much better to extract coordinates directly from json than convert it to string and then extract it from string. I have added lines 15-18 and modified lines 3, 6, 19 (and 11), so it should create .csv with lat and long fields instead of geo.
import csv, json, os elements_keys = ['created_at', 'text', 'lang'] with open('file.csv', 'w') as csvfile: writer = csv.writer(csvfile) writer.writerow(elements_keys + ['lat', 'long']) # header for dirs, subdirs, files in os.walk('/DIR'): for file in files: if file.endswith('.json'): with open(os.path.join(dirs,file), 'r') as input_file: # added os.path.join for line in input_file: try: tweet = json.loads(line) try: coords = tweet['geo']['coordinates'] # trying to get lat, long except Exception as e: coords = [None, None] row = [tweet[key] for key in elements_keys] + coords writer.writerow(row) # writing tweet into file except: continueThis code is getting more ugly; as you are processing tons of small files, it would be probably better to split it to a few small functions - say one to traverse, second one to convert one file to .csv. That would create tons of small csv, these can be concatenated after script finishes (in shell if you create them without header).