May-02-2018, 04:23 PM
Good day folks. I am still a novice python programmer but got an urgent work task assigned that python is the answer to :) So, here I go.
I have the xml output from a commercial software product that I have transferred to JSON via XML2JSON. My goal is to locate and extract some specific strings of data. Under most circumstances, I am finding what I am needing. However, in certain cases, there are multiple rows of data apparently in a list of nested objects and I am struggling to code to extract them. My basic issue is how to syntactically reference them. i.e. I can't get the syntax to reference the nested elements.
The data example:
This node is successfully parsed:
"Source": {
"@ID": "11",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PROD\\Product\\Customer\\FileType\\",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1",
"@ExFo": "Archive",
"Criteria": {
"comp": {
"@a": "[FileDateStamp]",
"@test": "DLT",
"@b": "[DateSubtract([Now],12H)]"
}
}
},
It is parsed using this syntax:
"Source": [
{
"@ID": "11",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PRODUCT\\CLIENT\\Outgoing",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1"
},
{
"@ID": "17",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PRODUCT\\CLIENT\\Outgoing2",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1"
}
],
In this case, there is an extra '[' and ']' at the beginning and end of the list. As I iterate through this how do I handle the sudden inclusion of multiple nested records? If you look at my other forum post for another project I am working on, I am basically having the same issue. It's a learning problem :) Maybe you can teach me to fish.
Here is the full code list in this very early build:
Traceback (most recent call last):
File "test3.py", line 19, in <module>
taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *')
AttributeError: 'list' object has no attribute 'get'
I have the xml output from a commercial software product that I have transferred to JSON via XML2JSON. My goal is to locate and extract some specific strings of data. Under most circumstances, I am finding what I am needing. However, in certain cases, there are multiple rows of data apparently in a list of nested objects and I am struggling to code to extract them. My basic issue is how to syntactically reference them. i.e. I can't get the syntax to reference the nested elements.
The data example:
This node is successfully parsed:
"Source": {
"@ID": "11",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PROD\\Product\\Customer\\FileType\\",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1",
"@ExFo": "Archive",
"Criteria": {
"comp": {
"@a": "[FileDateStamp]",
"@test": "DLT",
"@b": "[DateSubtract([Now],12H)]"
}
}
},
It is parsed using this syntax:
taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *')Under certain circumstances, there are multiple nodes in the 'Source' object:
"Source": [
{
"@ID": "11",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PRODUCT\\CLIENT\\Outgoing",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1"
},
{
"@ID": "17",
"@HostID": "0",
"@Type": "FileSystem",
"@Path": "D:\\PRODUCT\\CLIENT\\Outgoing2",
"@FileMask": "*.*",
"@DeleteOrig": "1",
"@NewFilesOnly": "0",
"@SearchSubdirs": "0",
"@Unzip": "0",
"@RetryIfNoFiles": "0",
"@UseDefRetryCount": "1",
"@UseDefRetryTimeoutSecs": "1",
"@UseDefRescanSecs": "1",
"@UDMxFi": "1",
"@UDMxBy": "1"
}
],
In this case, there is an extra '[' and ']' at the beginning and end of the list. As I iterate through this how do I handle the sudden inclusion of multiple nested records? If you look at my other forum post for another project I am working on, I am basically having the same issue. It's a learning problem :) Maybe you can teach me to fish.
Here is the full code list in this very early build:
import json import sys with open('output.json') as json_file: data = json.load(json_file) for row in data['Exported']['Tasks']['Task']: taskName = row['@Name'] taskID = row['@ID'] print(taskID,"\t",taskName) if row.get('Source'): # Note the defensive code - the value of 2 indicates I'm about to die if len(row.get('Source')) != 2: taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *') if row['Source'].get('@Path'): taskSourceFolder = row['Source'].get('@Path','* Source Folder Not Found *') if row['Source'].get('@FolderName'): taskSourceFolder = row['Source'].get('@FolderName','* Source Folder Not Found *') print("\tSource Information: ",taskSourceFolder,taskSourcePattern) if row.get('For'): # Note the defensive code - the value of 2 indicates I'm about to die if len(row.get('For')) != 2: taskDestFolder = row['For']['Destination'].get('@Path','* Destination Folder Not Found *') taskDestFile = row['For']['Destination'].get('@FileName','* Destination File Not Found *') print("\tDest: ",taskDestFolder,taskDestFile) print()Here is an example of the error:
Traceback (most recent call last):
File "test3.py", line 19, in <module>
taskSourcePattern = row['Source'].get('@FileMask','* Source Pattern Not Found *')
AttributeError: 'list' object has no attribute 'get'