Python Forum

Full Version: Merge JSON files prioritizing the updated values from most recent file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi folks!

I have series of JSON files similar to example below. "update" in "header" is the file creation timestamp. "hours" are timestamps for the values "x" and "y".

I would like to combine them in a single CSV file to import into Excel. The problem is most of the files contain updated "x" and "y" values in comparison to preceding file, e.g. first two timestamps are same as the last two timestamps from previous file, but "x" and "y" values were updated. So for the same timestamps there are updated more accurate "x" and "y" values.

With my limited knowledge I have tried to write a script which is ignoring older values from older file, comparing the "update" timestamp. It works only when I ignore "x" and "y" values and it plots series of hours.

Without further investigation why it does not work properly, I would like to ask to guide me to choose the right approach. I am sure there are more convenient ways doing it.

Thanks!


{
"header":{
"update":1555054504000
},
"data":{
"hours":[
1555038000000,
1555048800000,
1555059600000,
1555070400000,
1555081200000,
1555092000000,
1555102800000,
1555113600000
],
"x":[
241.095130810609,
235.6587698951538,
234.52988957999375,
238.14886341887896,
240.9792842156129,
234.37616327308106,
236.4281670519553,
239.34914914407685
],
"y":[
273.9192290114759,
271.7583893617311,
270.7841492576362,
277.412376380971,
279.51083939292204,
280.7639255517393,
277.92215250624633,
272.7410417669065
]
}
}

#! /usr/bin/python3
import os, json

path_to_json = 'data/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]

files = list(enumerate(json_files))

def appendJsonContent(fileNumber):
    information = []
    fileName = files[fileNumber][1]
    with open(os.path.join(path_to_json, fileName)) as jsonFile:
        json_text = json.load(jsonFile)
        for e, hours in enumerate(json_text['data']['hours']):
            x = json_text['data']['x'][e]
            y = json_text['data']['x'][e]
            information.append(str(hours) + " | " + str(x) + " | " + str(y))
    return information

# return the number of values which is different from the second list
def uniqueDataNumber(i):
    firstSetData = appendJsonContent(i)
    secondSetData = appendJsonContent(i+1)
    mergedSetData = list(set(firstSetData + secondSetData))
    return len(mergedSetData)-len(firstSetData)

for index in range(0, len(files), 2):
    for e in range(uniqueDataNumber(0)):
        print(appendJsonContent(index)[e])
    for e in range(uniqueDataNumber(1)):
        print(appendJsonContent(index+1)[e])