Python Forum
Removing characters from columns in data frame
Thread Rating:
  • 3 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing characters from columns in data frame
#16
When I think about it there could be problem with np.float applied on pandas serie, maybe following code would work:
df['long'] = df.geo.str.replace(pattern, r"\1").astype('float')
Regardless of it, its much better to extract coordinates directly from json than convert it to string and then extract it from string. I have added lines 15-18 and modified lines 3, 6, 19 (and 11), so it should create .csv with lat and long fields instead of geo.

import csv, json, os

elements_keys = ['created_at', 'text', 'lang']
with open('file.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(elements_keys + ['lat', 'long'])  # header

    for dirs, subdirs, files in os.walk('/DIR'):
        for file in files:
            if file.endswith('.json'):
                with open(os.path.join(dirs,file), 'r') as input_file:  # added os.path.join
                    for line in input_file:
                        try:
                            tweet = json.loads(line)
                            try:
                                coords = tweet['geo']['coordinates']  # trying to get lat, long
                            except Exception as e:
                                coords = [None, None]
                            row = [tweet[key] for key in elements_keys] + coords
                            writer.writerow(row)  # writing tweet into file
                        except:
                            continue
This code is getting more ugly; as  you are processing tons of small files, it would be probably better to split it to a few small functions - say one to traverse, second one to convert one file to .csv. That would create tons of small csv, these can be concatenated after script finishes (in shell if you create them without header).
Reply


Messages In This Thread
RE: Removing characters from columns in data frame - by zivoni - Apr-17-2017, 07:01 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 873 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  Filtering Data Frame, with another value NewBiee 9 1,628 Aug-21-2023, 10:53 AM
Last Post: NewBiee
  Deleting characters between certain characters stahorse 7 1,322 Jul-03-2023, 12:59 AM
Last Post: Pedroski55
  Exporting data frame to excel dyerlee91 0 1,707 Oct-05-2021, 11:34 AM
Last Post: dyerlee91
  Pandas Data frame column condition check based on length of the value aditi06 1 2,809 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  Adding a new column to a Panda Data Frame rsherry8 2 2,219 Jun-06-2021, 06:49 PM
Last Post: jefsummers
  import columns of data from local csv file CatherineKan 2 3,472 May-10-2021, 05:10 AM
Last Post: ricslato
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,532 Feb-20-2021, 08:25 PM
Last Post: perfringo
  grouped data frame glitter 0 1,668 Feb-02-2021, 11:22 AM
Last Post: glitter
  how to filter data frame dynamically with the columns psahay 0 2,461 Aug-24-2020, 01:10 PM
Last Post: psahay

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020