Python Forum
Automating the code using os.walk
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Automating the code using os.walk
#11
In your posted code you are converting and concatenating json's files to one big csv file. You are repeatedly reading one line from one of your json's file, extracting four values you are interested in and storing them in your elements, so they can be converted to a dataframe and exported as a csv at the end.

Instead adding tweet values to elements you could just write that tweet directly to csv file - you wont "accumulate" all tweets in memory, you will just read one line with tweet, parse that single json, write it into file - memory reqs will be low.

Your code could be something like (untested, just writer and writing added):
import csv, json, os

elements_keys = ['created_at', 'text', 'lang', 'geo']
with open('outfile.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(elements_keys)   # header
    
    for dirs, subdirs, files in os.walk('/home/Dir'):
        for file in files:
            if file.endswith('.json'):
                with open(file, 'r') as input_file:
                    for line in input_file:
                        try:
                            tweet = json.loads(line)
                            row = [tweet[key] for key in elements_keys]
                            writer.writerow(row)     # writing tweet into file
                        except:
                            continue
I am not sure about performance of csv.writer when writing line by line, maybe it would be better to accumulate rows in a auxiliary list and write them at once every 10000ish rows with csv.writerows(). But for start i would try it one by one (and on smaller number of files).
Reply


Messages In This Thread
Automating the code using os.walk - by kiton - Mar-09-2017, 09:48 PM
RE: Automating the code using os.walk - by wavic - Mar-09-2017, 09:54 PM
RE: Automating the code using os.walk - by kiton - Mar-09-2017, 10:21 PM
RE: Automating the code using os.walk - by zivoni - Mar-09-2017, 10:51 PM
RE: Automating the code using os.walk - by wavic - Mar-09-2017, 11:02 PM
RE: Automating the code using os.walk - by kiton - Mar-09-2017, 11:46 PM
RE: Automating the code using os.walk - by kiton - Apr-03-2017, 12:10 AM
RE: Automating the code using os.walk - by Ofnuts - Apr-03-2017, 07:26 AM
RE: Automating the code using os.walk - by zivoni - Apr-03-2017, 08:12 AM
RE: Automating the code using os.walk - by kiton - Apr-03-2017, 04:34 PM
RE: Automating the code using os.walk - by zivoni - Apr-03-2017, 05:46 PM
RE: Automating the code using os.walk - by kiton - Apr-04-2017, 03:27 AM
RE: Automating the code using os.walk - by zivoni - Apr-04-2017, 07:54 AM
RE: Automating the code using os.walk - by kiton - Apr-04-2017, 06:28 PM
RE: Automating the code using os.walk - by zivoni - Apr-04-2017, 06:40 PM
RE: Automating the code using os.walk - by kiton - Apr-04-2017, 06:48 PM
RE: Automating the code using os.walk - by Ofnuts - Apr-04-2017, 07:32 PM
RE: Automating the code using os.walk - by zivoni - Apr-04-2017, 07:54 PM
RE: Automating the code using os.walk - by kiton - Apr-04-2017, 11:23 PM
RE: Automating the code using os.walk - by zivoni - Apr-04-2017, 11:44 PM
RE: Automating the code using os.walk - by kiton - Apr-13-2017, 06:15 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Automating to generate multiple arrays Robotguy 1 1,821 Nov-05-2020, 08:14 AM
Last Post: Gribouillis
  Automating PyTables Dataset Creation and Append Robotguy 1 1,817 Oct-18-2020, 08:35 PM
Last Post: jefsummers
  Automating to save generated data Robotguy 3 2,284 Aug-12-2020, 03:32 PM
Last Post: Robotguy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020