Automating the code using os.walk

***zivoni*** · Apr-03-2017, 05:46 PM

In your posted code you are converting and concatenating json's files to one big csv file. You are repeatedly reading one line from one of your json's file, extracting four values you are interested in and storing them in your elements, so they can be converted to a dataframe and exported as a csv at the end.

Instead adding tweet values to elements you could just write that tweet directly to csv file - you wont "accumulate" all tweets in memory, you will just read one line with tweet, parse that single json, write it into file - memory reqs will be low.

Your code could be something like (untested, just writer and writing added):

import csv, json, os

elements_keys = ['created_at', 'text', 'lang', 'geo']
with open('outfile.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(elements_keys)   # header
    
    for dirs, subdirs, files in os.walk('/home/Dir'):
        for file in files:
            if file.endswith('.json'):
                with open(file, 'r') as input_file:
                    for line in input_file:
                        try:
                            tweet = json.loads(line)
                            row = [tweet[key] for key in elements_keys]
                            writer.writerow(row)     # writing tweet into file
                        except:
                            continue

I am not sure about performance of csv.writer when writing line by line, maybe it would be better to accumulate rows in a auxiliary list and write them at once every 10000ish rows with csv.writerows(). But for start i would try it one by one (and on smaller number of files).

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Automating to generate multiple arrays	Robotguy	1	1,821	Nov-05-2020, 08:14 AM Last Post: Gribouillis
	Automating PyTables Dataset Creation and Append	Robotguy	1	1,817	Oct-18-2020, 08:35 PM Last Post: jefsummers
	Automating to save generated data	Robotguy	3	2,284	Aug-12-2020, 03:32 PM Last Post: Robotguy

Automating the code using os.walk

User Panel Messages

Announcements