Removing characters from columns in data frame

***zivoni*** · Apr-17-2017, 07:01 PM

When I think about it there could be problem with np.float applied on pandas serie, maybe following code would work:

df['long'] = df.geo.str.replace(pattern, r"\1").astype('float')

Regardless of it, its much better to extract coordinates directly from json than convert it to string and then extract it from string. I have added lines 15-18 and modified lines 3, 6, 19 (and 11), so it should create .csv with lat and long fields instead of geo.

import csv, json, os

elements_keys = ['created_at', 'text', 'lang']
with open('file.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(elements_keys + ['lat', 'long'])  # header

    for dirs, subdirs, files in os.walk('/DIR'):
        for file in files:
            if file.endswith('.json'):
                with open(os.path.join(dirs,file), 'r') as input_file:  # added os.path.join
                    for line in input_file:
                        try:
                            tweet = json.loads(line)
                            try:
                                coords = tweet['geo']['coordinates']  # trying to get lat, long
                            except Exception as e:
                                coords = [None, None]
                            row = [tweet[key] for key in elements_keys] + coords
                            writer.writerow(row)  # writing tweet into file
                        except:
                            continue

This code is getting more ugly; as you are processing tons of small files, it would be probably better to split it to a few small functions - say one to traverse, second one to convert one file to .csv. That would create tons of small csv, these can be concatenated after script finishes (in shell if you create them without header).

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Grouping in pandas/multi-index data frame	Aleqsie	3	873	Jan-06-2024, 03:55 PM Last Post: deanhystad
	Filtering Data Frame, with another value	NewBiee	9	1,628	Aug-21-2023, 10:53 AM Last Post: NewBiee
	Deleting characters between certain characters	stahorse	7	1,322	Jul-03-2023, 12:59 AM Last Post: Pedroski55
	Exporting data frame to excel	dyerlee91	0	1,707	Oct-05-2021, 11:34 AM Last Post: dyerlee91
	Pandas Data frame column condition check based on length of the value	aditi06	1	2,809	Jul-28-2021, 11:08 AM Last Post: jefsummers
	Adding a new column to a Panda Data Frame	rsherry8	2	2,219	Jun-06-2021, 06:49 PM Last Post: jefsummers
	import columns of data from local csv file	CatherineKan	2	3,472	May-10-2021, 05:10 AM Last Post: ricslato
	pandas.to_datetime: Combine data from 2 columns	ju21878436312	1	2,532	Feb-20-2021, 08:25 PM Last Post: perfringo
	grouped data frame	glitter	0	1,668	Feb-02-2021, 11:22 AM Last Post: glitter
	how to filter data frame dynamically with the columns	psahay	0	2,461	Aug-24-2020, 01:10 PM Last Post: psahay

Removing characters from columns in data frame

User Panel Messages

Announcements