Python Forum
Removing characters from columns in data frame
Thread Rating:
  • 3 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing characters from columns in data frame
#1
Hi! So, I came up with the following code to extract Twitter data from JSON and create a data frame with several columns:

# Import libraries
import json
import pandas as pd

# Extract data from JSON
tweets = []
for line in open('00.json'):
  try: 
    tweets.append(json.loads(line))
  except:
    pass

# Tweets often have missing data, therefore use -if- when extracting "keys"
tweet = tweets[0]
ids = [tweet['id_str'] for tweet in tweets if 'id_str' in tweet] 
text = [tweet['text'] for tweet in tweets if 'text' in tweet]
lang = [tweet['lang'] for tweet in tweets if 'lang' in tweet]
geo = [tweet['geo'] for tweet in tweets if 'geo' in tweet]
place = [tweet['place'] for tweet in tweets if 'place' in tweet]

# Create a data frame (using pd.Index may be "incorrect", but I am a noob)
df=pd.DataFrame({'Ids':pd.Index(ids),
                'Text':pd.Index(text),
                'Lang':pd.Index(lang),
                'Geo':pd.Index(geo),
                'Place':pd.Index(place)})

# Convert "object" to "string" type
df.Lang.apply(str)
df.Geo.apply(str)

# Select tweets in English and with geo tag
df[(df['Lang']==('en',)) & (df['Geo'] != (None,))]
So far, everything seems more or less fine. 

Now, the problem. 

For example:

"Ids" value is recorded as "(396154642666913792,)" ;
Or "Geo" value is recorded as "({'coordinates': [41.63349811, -93.65831894], 'type': 'Point'},)"

Question: How do I remove the "extra" characters -- i.e., (), {}, 'coordinates':, etc.?

Thank you in advance for help!
Reply


Messages In This Thread
Removing characters from columns in data frame - by kiton - Mar-02-2017, 10:19 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 873 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  Filtering Data Frame, with another value NewBiee 9 1,628 Aug-21-2023, 10:53 AM
Last Post: NewBiee
  Deleting characters between certain characters stahorse 7 1,322 Jul-03-2023, 12:59 AM
Last Post: Pedroski55
  Exporting data frame to excel dyerlee91 0 1,707 Oct-05-2021, 11:34 AM
Last Post: dyerlee91
  Pandas Data frame column condition check based on length of the value aditi06 1 2,809 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  Adding a new column to a Panda Data Frame rsherry8 2 2,219 Jun-06-2021, 06:49 PM
Last Post: jefsummers
  import columns of data from local csv file CatherineKan 2 3,473 May-10-2021, 05:10 AM
Last Post: ricslato
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,532 Feb-20-2021, 08:25 PM
Last Post: perfringo
  grouped data frame glitter 0 1,668 Feb-02-2021, 11:22 AM
Last Post: glitter
  how to filter data frame dynamically with the columns psahay 0 2,461 Aug-24-2020, 01:10 PM
Last Post: psahay

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020