Python Forum

Full Version: Object madness - JSON Notation confusion
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Good day folks. I am still a novice python programmer but got an urgent work task assigned that python is the answer to :) So, here I go.

I have the xml output from a commercial software product that I have transferred to JSON via XML2JSON. My goal is to locate and extract some specific strings of data. Under most circumstances, I am finding what I am needing. However, in certain cases, there are multiple rows of data apparently in a list of nested objects and I am struggling to code to extract them. My basic issue is how to syntactically reference them. i.e. I can't get the syntax to reference the nested elements.

The data example:

This node is successfully parsed:

"Source": {
"@ID": "11",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PROD\\Product\\Customer\\FileType\\",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1",
"@ExFo": "Archive",
"Criteria": {
"comp": {
"@a": "[FileDateStamp]",
"@test": "DLT",
"@b": "[DateSubtract([Now],12H)]"
}
}
},

It is parsed using this syntax:
				taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *')						
Under certain circumstances, there are multiple nodes in the 'Source' object:

"Source": [
{
"@ID": "11",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PRODUCT\\CLIENT\\Outgoing",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1"
},
{
"@ID": "17",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PRODUCT\\CLIENT\\Outgoing2",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1"
}
],
In this case, there is an extra '[' and ']' at the beginning and end of the list. As I iterate through this how do I handle the sudden inclusion of multiple nested records? If you look at my other forum post for another project I am working on, I am basically having the same issue. It's a learning problem :) Maybe you can teach me to fish.

Here is the full code list in this very early build:

import json
import sys

with open('output.json') as json_file:  
	data = json.load(json_file)	
	
	for row in data['Exported']['Tasks']['Task']:
		taskName = row['@Name']
		taskID = row['@ID']	
		print(taskID,"\t",taskName)

		if row.get('Source'):			
# Note the defensive code - the value of 2 indicates I'm about to die
			if len(row.get('Source')) != 2:	
				taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *')						
				if row['Source'].get('@Path'):			
					taskSourceFolder =  row['Source'].get('@Path','* Source Folder Not Found *')		
				if row['Source'].get('@FolderName'):
					taskSourceFolder =  row['Source'].get('@FolderName','* Source Folder Not Found *')							
				print("\tSource Information: ",taskSourceFolder,taskSourcePattern)
		if row.get('For'):					
# Note the defensive code - the value of 2 indicates I'm about to die
			if len(row.get('For')) != 2:
				taskDestFolder =  row['For']['Destination'].get('@Path','* Destination Folder Not Found *')
				taskDestFile =  row['For']['Destination'].get('@FileName','* Destination File Not Found *')
				print("\tDest: ",taskDestFolder,taskDestFile)											
		print()
Here is an example of the error:

Traceback (most recent call last):
File "test3.py", line 19, in <module>
taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *')
AttributeError: 'list' object has no attribute 'get'
why are you translating?
why not just read the native json format directly?
Is the posted data in original json format?
This is typically problem that can be solved with [0].
JSON has often mixed in list.
Here a quick example with fix.
my_obj = {
    "name":"John",
    "age":30,
    "cars": [
        { "name":"Ford", "models":[ "Fiesta", "Focus", "Mustang" ] },
        { "name":"BMW", "models":[ "320", "X3", "X5" ] },
        { "name":"Fiat", "models":[ "500", "Panda" ] }
    ]
 }
Test:
>>> my_obj.get('age')
30

>>> my_obj.get('cars')
[{'models': ['Fiesta', 'Focus', 'Mustang'], 'name': 'Ford'},
 {'models': ['320', 'X3', 'X5'], 'name': 'BMW'},
 {'models': ['500', 'Panda'], 'name': 'Fiat'}]

>>> my_obj.get('cars').get('models')
Traceback (most recent call last):
  File "<string>", line 428, in runcode
  File "<interactive input>", line 1, in <module>
AttributeError: 'list' object has no attribute 'get'

>>> # Fix
>>> my_obj.get('cars')[0].get('models')
['Fiesta', 'Focus', 'Mustang']
>>> my_obj.get('cars')[1].get('models')
['320', 'X3', 'X5']
You need to traverse JSON structure recursively if you don't know the exact layout beforehand.

You may also search XML directly with lxml package - something with iterfind will work.
which xml2json library you use? is there an option to always produce consistent output - i.e. array for Source, even when single element?