Python Forum
Parse XML - how to handle deep levels/hierarchy - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Parse XML - how to handle deep levels/hierarchy (/thread-9548.html)



Parse XML - how to handle deep levels/hierarchy - dwill - Apr-15-2018

Hello all -

I am parsing some xml files and the hierarchy can be shallow or very deep.

I need to check each level Root -> Parent -> Child -> sub-child -> sub-sub-child...etc to determine if I need to process more/deeper and also save the tag and some of the attributes.

I find that I get the root level and then use a "for child in parent" loop multiple times, like this:

doc = etree.parse(full_path)
root = doc.getroot()

# set the namespaces used in the .dtsx file
ns = {'DTS': 'www.microsoft.com/SqlServer/Dts', 'SQLTask': 'www.microsoft.com/sqlserver/dts/tasks/sqltask'}

# collect the executables (parents/children) in an object
executables = root.xpath('DTS:Executables/DTS:Executable', namespaces=ns)
for child0 in executables:
   # check each tag or attribute and ONLY continue to process specific types
   if child0type = "one of the types i want to process":
        #save the child0type tag and one more attributes in some variables and then continue to process...
        for child1type in child0:
             # check the type and continue 
             if child1type = "one of the ones I need to process""
                  # save some info to variables 
                   for child2 in child1:
                    ....
                     for child3 in child2:
                ....etc.
I am at the point, 6 or 7 levels deep...and it just feels like the wrong way to handle something like this.

In the end, I need to create a csv that has a single row/line for each item and the items for each child and child type.

thanks for any suggestions or feedback.


RE: Parse XML - how to handle deep levels/hierarchy - Larz60+ - Apr-16-2018

stop starting new threads for the same post. This is your third on same subject!

I had code that almost worked that I showed on your first post, and would of continued except you went in another direction.
What I was working on would be able to parse any xml, completely without having to know anything about the subject.


RE: Parse XML - how to handle deep levels/hierarchy - dwill - Apr-16-2018

Sorry to have upset you or caused you to have a bad day.

I did try your code from the other post and it only produced the root level and the level below. I tried to modify it for the types of files that I am working with that have up to 20 levels deep of hierarchies that i need to check and analyze. The suggestions from others actually provided a better route for me. My new post was more about the strategy I was taking to try and handle the different types of xml files, where some have a shallow tree and others (that I only ran across the other day) are very deep hierarchies.

thanks


RE: Parse XML - how to handle deep levels/hierarchy - Larz60+ - Apr-16-2018

Not upset, it's just forum rules to not start new threads on same subject,
see: https://python-forum.io/misc.php?action=help&hid=22
Quote:and it only produced the root level and the level below
which I explained in the text.
I probably didn't explain that I would continue with what I started. Stranac's method is probably what you want to use. once you have the child nodes, you can look at the attrib values which should be dictionaries. You can also get the length of each child with len(child), which if greater than one, indicates that each attrib can be accessed as child[0].attrib and child[1].attrib.

From there, you can get the values in an attrib with:
for key, value in child.attrib.items():
    ...
Hope that helps


RE: Parse XML - how to handle deep levels/hierarchy - dwill - Apr-16-2018

Thanks. I guess I didn't think of the threads being the same subject. But, I see your point. In the future, I will add questions to the same thread if they are on the same overall subject.

And, sorry, I missed the part that you were going to continue to add to your original solution.

Your suggestion of using the attrib and len(child) and then a for loop over the attrib items is fantastic!

Hopefully, this will be my last post on this thread. :)

thanks so much for the help.


RE: Parse XML - how to handle deep levels/hierarchy - Larz60+ - Apr-17-2018

Here's a script that will expose the contents of an XML script
without regard to the actual data:
This might prove helpful:
import XMLpaths
from pathlib import PurePath, PosixPath
from lxml import etree as et
import os


class ParseUsing_lxml:
    def __init__(self, infile):
        self.xpath = XMLpaths.XMLpaths()
        self.infile = os.path.abspath(self.xpath.xmlpath / infile)
        self.tree = et.parse(self.infile)
        self.root = self.tree.getroot()
        ptree = et.tostring(self.root, pretty_print=True).decode("utf-8")
        # print(ptree)
        self.children = []
        self.get_children(self.root)
        # print(self.children)
        self.tree_dict = {}
        self.parse_tree()

    def get_children(self, root):
        for n in range(len(root)):
            self.children.append(root[n])
            if(len(root[n])):
                self.get_children(root[n])

    def parse_tree(self):
        sep1 = '=' * 90
        sep2 = '-' * 90
        root = self.root
        children = self.children
        print('\nRoot:')
        print(f'Root attributes: {root.attrib}')
        for key, value in root.items():
            print(f'key: {key}, vtype: {type(value)}, value: {value}')
        print(sep1)
        print('Child attributes')
        for n, child in enumerate(children):
            print(f'Child{n} child: {child}')
            print(f'Child{n} attributes: {child.attrib}')
            for key, value in child.items():
                print(f'key: {key}, vtype: {type(value)}, value: {value}')
            print(sep2)

def tryit():
    pl = ParseUsing_lxml('ziggy.xml')

if __name__ == '__main__':
    tryit()
produces:
Output:
Root: Root attributes: {'{www.example.com/myExample/Xyz}Id': 'Package', '{www.example.com/myExample/Xyz}CreationDate': '2/21/2018 11:11:48 AM', '{www.example.com/myExample/Xyz}XYZID': '{FB8BE06B-76B6-44DA-B2C7-043BD0989CBF}', '{www.example.com/myExample/Xyz}ObjectName': 'MyTestProject', '{www.example.com/myExample/Xyz}VersionGUID': '{8D9F7CDA-590E-44C3-8896-786D27167F7D}'} key: {www.example.com/myExample/Xyz}Id, vtype: <class 'str'>, value: Package key: {www.example.com/myExample/Xyz}CreationDate, vtype: <class 'str'>, value: 2/21/2018 11:11:48 AM key: {www.example.com/myExample/Xyz}XYZID, vtype: <class 'str'>, value: {FB8BE06B-76B6-44DA-B2C7-043BD0989CBF} key: {www.example.com/myExample/Xyz}ObjectName, vtype: <class 'str'>, value: MyTestProject key: {www.example.com/myExample/Xyz}VersionGUID, vtype: <class 'str'>, value: {8D9F7CDA-590E-44C3-8896-786D27167F7D} ========================================================================================== Child attributes Child0 child: <Element {www.example.com/myExample/Xyz}Property at 0x2ccbe08> Child0 attributes: {'{www.example.com/myExample/Xyz}Name': 'PackageFormatVersion'} key: {www.example.com/myExample/Xyz}Name, vtype: <class 'str'>, value: PackageFormatVersion ------------------------------------------------------------------------------------------ Child1 child: <Element {www.example.com/myExample/Xyz}ConnectionManagers at 0x2ccbe48> Child1 attributes: {} ------------------------------------------------------------------------------------------ Child2 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2ccbe88> Child2 attributes: {'{www.example.com/myExample/Xyz}refId': 'Package.ConnectionManagers[RTG093939BB.AdminDB]', '{www.example.com/myExample/Xyz}CreationName': 'OLEDB', '{www.example.com/myExample/Xyz}XYZID': '{C67B6283-781F-4B0E-A9A7-376A157B6F16}', '{www.example.com/myExample/Xyz}ObjectName': 'RTG093939BB.AdminDB'} key: {www.example.com/myExample/Xyz}refId, vtype: <class 'str'>, value: Package.ConnectionManagers[RTG093939BB.AdminDB] key: {www.example.com/myExample/Xyz}CreationName, vtype: <class 'str'>, value: OLEDB key: {www.example.com/myExample/Xyz}XYZID, vtype: <class 'str'>, value: {C67B6283-781F-4B0E-A9A7-376A157B6F16} key: {www.example.com/myExample/Xyz}ObjectName, vtype: <class 'str'>, value: RTG093939BB.AdminDB ------------------------------------------------------------------------------------------ Child3 child: <Element {www.example.com/myExample/Xyz}ObjectData at 0x2ccbec8> Child3 attributes: {} ------------------------------------------------------------------------------------------ Child4 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2ccbf08> Child4 attributes: {'{www.example.com/myExample/Xyz}ConnectionString': 'Data Source=RTG093939BB;Initial Catalog=AdminDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;'} key: {www.example.com/myExample/Xyz}ConnectionString, vtype: <class 'str'>, value: Data Source=RTG093939BB;Initial Catalog=AdminDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False; ------------------------------------------------------------------------------------------ Child5 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2ccbf88> Child5 attributes: {'{www.example.com/myExample/Xyz}refId': 'Package.ConnectionManagers[RTG093955XT.Stage]', '{www.example.com/myExample/Xyz}CreationName': 'OLEDB', '{www.example.com/myExample/Xyz}XYZID': '{8B4F57EA-03EA-49FA-B4BD-828A89FE5A32}', '{www.example.com/myExample/Xyz}ObjectName': 'RTG093955XT.Stage'} key: {www.example.com/myExample/Xyz}refId, vtype: <class 'str'>, value: Package.ConnectionManagers[RTG093955XT.Stage] key: {www.example.com/myExample/Xyz}CreationName, vtype: <class 'str'>, value: OLEDB key: {www.example.com/myExample/Xyz}XYZID, vtype: <class 'str'>, value: {8B4F57EA-03EA-49FA-B4BD-828A89FE5A32} key: {www.example.com/myExample/Xyz}ObjectName, vtype: <class 'str'>, value: RTG093955XT.Stage ------------------------------------------------------------------------------------------ Child6 child: <Element {www.example.com/myExample/Xyz}ObjectData at 0x2ccbfc8> Child6 attributes: {} ------------------------------------------------------------------------------------------ Child7 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2cdb048> Child7 attributes: {'{www.example.com/myExample/Xyz}ConnectionString': 'Data Source=RTG093955XT;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;'} key: {www.example.com/myExample/Xyz}ConnectionString, vtype: <class 'str'>, value: Data Source=RTG093955XT;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False; ------------------------------------------------------------------------------------------



RE: Parse XML - how to handle deep levels/hierarchy - dwill - Apr-17-2018

Hello -

This looks great, amazing. I was going to try it out, but i need to find that XMLpaths module that you import/reference. I cannot seem to find it...which is strange to me. I was searching online (python module XMLpaths), but no results.

once i can find that module and import/install it...i will check this out...because you are right, it does look very promising and valuable.

thanks again


RE: Parse XML - how to handle deep levels/hierarchy - Larz60+ - Apr-17-2018

oops, that's one of mine,
here is is:
from pathlib import Path

class XMLpaths:
    def __init__(self):
        self.homepath = Path('.')
        self.rootpath = self.homepath / ('..')
        self.docpath = self.rootpath / 'doc'
        self.docpath.mkdir(exist_ok=True)
        self.datapath = self.rootpath / 'data'
        self.datapath.mkdir(exist_ok=True)
        self.xmlpath = self.datapath / 'xml'
        self.xmlpath.mkdir(exist_ok=True)

if __name__ == '__main__':
    '''
    running standalone once will create any missing directory, and will not harm any existing directories 
    '''
    XMLpaths()



RE: Parse XML - how to handle deep levels/hierarchy - dwill - Apr-17-2018

Thanks. And, this works great. Provides a lot of valuable details.


thanks again!