Python Forum

Full Version: Dealing with a .json nightmare... ideas?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hi guys,
I'm working with an API and am dealing with a pretty large json dump. It's about 1000 lines and in there lies particular information I'm looking for. I would like to know how to best parse this information and pull what I need from it. The data all gets written to a file called 'RECdemp.json'.

Something like this?
collection={}
r=requests.get(url, headers=h, params=p)
r=str(r)
for index in enumerate(r):
    key_name = 'col{0}.format(index)
    collection[key_name] = responses
key1 = collection['col1']
key2 = collection['col2']
key3 = collection['col3']
return (key1, key2, key3)
Now, this doesn't work... and I have no idea why it doesn't work, because I don't really understand what that block of code is doing. I use a similar for loop to parse/iterate through another file to collect specific results, but I'm not good enough yet to write these without assistance.

I think what I will do to correlate sent sms messages and their replies to deliver to the proper recipient is to keep a sort of "dictionary" file with a dictionary full of who belongs to which message ID, and reference this file when delivering the response back to the sender, but first i need to sift through all this json!

Here's what some of the json looks like. The values in particular that I am interested in and would like to pull are: 'id' (first key) and 'body' (second key).

Quote:{'id': 1005672, 'messages': [{'id': 4461048, 'body': 'Mnow test test', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 0, 'status': 'RECEIVED', 'error': None, 'kind': 'INCOMING', 'outgoing': False, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respondedDate': 0, 'sentDate': 1576783232355, 'attachments': []}, {'id': 4461049, 'body': 'THIS NUMBER DOES NOT CURRENTLY ACCEPT TEXT MESSAGES PLEASE CALL TO WORK WITH ONE OF OUR INTAKE SPECIALISTS', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 0, 'status': 'RECEIVED', 'error': None, 'kind': 'AUTO_RESPONSE', 'outgoing': True, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respondedDate': 0, 'sentDate': 1576783233546, 'attachments': []}, {'id': 4620511, 'body': 'test sms,test sms', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 17297, 'status': 'DELIVERED', 'error': None, 'kind': 'API', 'outgoing': True, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respondedDate': 0, 'sentDate': 1577987093930, 'attachments': []}, {'id': 4620584, 'body': 'test sms', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 17297, 'status': 'DELIVERED', 'error': None, 'kind': 'API', 'outgoing': True, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respondedDate': 0, 'sentDate': 1577987195228, 'attachments': []}, {'id': 4646648, 'body': 'test sms', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 17297, 'status': 'DELIVERED', 'error': None, 'kind': 'API', 'outgoing': True, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respondedDate': 0, 'sentDate': 1578083596336, 'attachments': []}, {'id': 4646877, 'body': 'test sms', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 17297, 'status': 'DELIVERED', 'error': None, 'kind': 'API', 'outgoing': True, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respondedDate': 0, 'sentDate': 1578084072306, 'attachments': []}, {'id': 4671891, 'body': 'test sms', 'conversationId': 1005672, 'locationId': 2045, 'contactId': 12792806, 'assignedUserId': 17297, 'status': 'DELIVERED', 'error': None, 'kind': 'API', 'outgoing': True, 'reviewRequest': False, 'type': 'SMS', 'readDate': 0, 'respon
Why not use the json module from the standard library?
Or you can have requests process the JSON directly: https://requests.readthedocs.io/en/maste...se-content
i will often use pretty print just to be able to quickly discern keys and values from a json dump. However dont rewrite it as that. It is merely a debugging tool. It should create some clarity in that though.
thank you all for your time and suggestions. I will check them out...

However, once I'm able to get the json dumped in a more readable fashion, is there a best-practice or standard way to select just the key/value pairs out of the bunch that I need?

I'm guessing there has to be a more effective way than to make a bunch of empty variables/objects, looping through the chunk of json and assigning each key/value to an object?
(Jan-28-2020, 02:23 PM)t4keheart Wrote: [ -> ]However, once I'm able to get the json dumped in a more readable fashion
You are going to LOAD json response into JSON object (either via requests.Response.json() method or using json module from standard library).

we need to see the whole json, because it's not clear. I guess you will actually have list of dicts. Each dict will have key "messages" with value - a list of dicts, each dict being a separate message

based on above guess, it will be something like (pseudocode):
for item in json_data:
    for message in item['messages']:
        message_id = message['id']
        message_body = message['body']
Fixing your half json data,so i can test it.
Make a new dictionary,assign vaules you need to it eg id and body.
>>> record = {}
>>> for item in json_data['messages']:
...     record[item['body']] = item['id']    
...  
   
>>> record
{'Mnow test test': 4461048,
 'test 9999': 4461049,
 'test sms': 4671891,
 'test sms,test sms': 4620511}

>>> record.get('test 9999', 'Not in record')
4461049

>>> record.get('car', 'Not in record')
'Not in record'
@snippsat - probably id is unique per message, while body may be repeated
(Jan-28-2020, 05:24 PM)buran Wrote: [ -> ]@snippsat - probably id is unique per message, while body may be repeate
Yes of course agree,so a turn around.
>>> record = {}
>>> for item in json_data['messages']:
...     record[item['id']] = item['body']   
     
>>> record
{4461048: 'Mnow test test',
 4461049: 'test 9999',
 4620511: 'test sms,test sms',
 4620584: 'test sms',
 4646648: 'test sms',
 4646877: 'test sms',
 4671891: 'test sms'}

>>> record.get(4461049, 'Not in record')
'test 9999'

>>> record.get(9999999, 'Not in record')
'Not in record'
Wow, couldn't be more impressed with the amount of kindness and level of detail in explanation I've received since joining this forum. I really appreciate your guys' time.

Yes, the ID is the unique identifier (there are both "conversation id's", and individual "message id's"). The key/value pairs im interested in are both convo/message id, and body.

The code examples are fantastic, I'm certain I will be able to conjur a workable function from them.

If you would like the full json, here's an example of the raw json that gets passed back from the api GET request:
json_full
Pages: 1 2