Python Forum
Parse XML - how to handle deep levels/hierarchy
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parse XML - how to handle deep levels/hierarchy
#1
Hello all -

I am parsing some xml files and the hierarchy can be shallow or very deep.

I need to check each level Root -> Parent -> Child -> sub-child -> sub-sub-child...etc to determine if I need to process more/deeper and also save the tag and some of the attributes.

I find that I get the root level and then use a "for child in parent" loop multiple times, like this:

doc = etree.parse(full_path)
root = doc.getroot()

# set the namespaces used in the .dtsx file
ns = {'DTS': 'www.microsoft.com/SqlServer/Dts', 'SQLTask': 'www.microsoft.com/sqlserver/dts/tasks/sqltask'}

# collect the executables (parents/children) in an object
executables = root.xpath('DTS:Executables/DTS:Executable', namespaces=ns)
for child0 in executables:
   # check each tag or attribute and ONLY continue to process specific types
   if child0type = "one of the types i want to process":
        #save the child0type tag and one more attributes in some variables and then continue to process...
        for child1type in child0:
             # check the type and continue 
             if child1type = "one of the ones I need to process""
                  # save some info to variables 
                   for child2 in child1:
                    ....
                     for child3 in child2:
                ....etc.
I am at the point, 6 or 7 levels deep...and it just feels like the wrong way to handle something like this.

In the end, I need to create a csv that has a single row/line for each item and the items for each child and child type.

thanks for any suggestions or feedback.
Reply
#2
stop starting new threads for the same post. This is your third on same subject!

I had code that almost worked that I showed on your first post, and would of continued except you went in another direction.
What I was working on would be able to parse any xml, completely without having to know anything about the subject.
Reply
#3
Sorry to have upset you or caused you to have a bad day.

I did try your code from the other post and it only produced the root level and the level below. I tried to modify it for the types of files that I am working with that have up to 20 levels deep of hierarchies that i need to check and analyze. The suggestions from others actually provided a better route for me. My new post was more about the strategy I was taking to try and handle the different types of xml files, where some have a shallow tree and others (that I only ran across the other day) are very deep hierarchies.

thanks
Reply
#4
Not upset, it's just forum rules to not start new threads on same subject,
see: https://python-forum.io/misc.php?action=help&hid=22
Quote:and it only produced the root level and the level below
which I explained in the text.
I probably didn't explain that I would continue with what I started. Stranac's method is probably what you want to use. once you have the child nodes, you can look at the attrib values which should be dictionaries. You can also get the length of each child with len(child), which if greater than one, indicates that each attrib can be accessed as child[0].attrib and child[1].attrib.

From there, you can get the values in an attrib with:
for key, value in child.attrib.items():
    ...
Hope that helps
Reply
#5
Thanks. I guess I didn't think of the threads being the same subject. But, I see your point. In the future, I will add questions to the same thread if they are on the same overall subject.

And, sorry, I missed the part that you were going to continue to add to your original solution.

Your suggestion of using the attrib and len(child) and then a for loop over the attrib items is fantastic!

Hopefully, this will be my last post on this thread. :)

thanks so much for the help.
Reply
#6
Here's a script that will expose the contents of an XML script
without regard to the actual data:
This might prove helpful:
import XMLpaths
from pathlib import PurePath, PosixPath
from lxml import etree as et
import os


class ParseUsing_lxml:
    def __init__(self, infile):
        self.xpath = XMLpaths.XMLpaths()
        self.infile = os.path.abspath(self.xpath.xmlpath / infile)
        self.tree = et.parse(self.infile)
        self.root = self.tree.getroot()
        ptree = et.tostring(self.root, pretty_print=True).decode("utf-8")
        # print(ptree)
        self.children = []
        self.get_children(self.root)
        # print(self.children)
        self.tree_dict = {}
        self.parse_tree()

    def get_children(self, root):
        for n in range(len(root)):
            self.children.append(root[n])
            if(len(root[n])):
                self.get_children(root[n])

    def parse_tree(self):
        sep1 = '=' * 90
        sep2 = '-' * 90
        root = self.root
        children = self.children
        print('\nRoot:')
        print(f'Root attributes: {root.attrib}')
        for key, value in root.items():
            print(f'key: {key}, vtype: {type(value)}, value: {value}')
        print(sep1)
        print('Child attributes')
        for n, child in enumerate(children):
            print(f'Child{n} child: {child}')
            print(f'Child{n} attributes: {child.attrib}')
            for key, value in child.items():
                print(f'key: {key}, vtype: {type(value)}, value: {value}')
            print(sep2)

def tryit():
    pl = ParseUsing_lxml('ziggy.xml')

if __name__ == '__main__':
    tryit()
produces:
Output:
Root: Root attributes: {'{www.example.com/myExample/Xyz}Id': 'Package', '{www.example.com/myExample/Xyz}CreationDate': '2/21/2018 11:11:48 AM', '{www.example.com/myExample/Xyz}XYZID': '{FB8BE06B-76B6-44DA-B2C7-043BD0989CBF}', '{www.example.com/myExample/Xyz}ObjectName': 'MyTestProject', '{www.example.com/myExample/Xyz}VersionGUID': '{8D9F7CDA-590E-44C3-8896-786D27167F7D}'} key: {www.example.com/myExample/Xyz}Id, vtype: <class 'str'>, value: Package key: {www.example.com/myExample/Xyz}CreationDate, vtype: <class 'str'>, value: 2/21/2018 11:11:48 AM key: {www.example.com/myExample/Xyz}XYZID, vtype: <class 'str'>, value: {FB8BE06B-76B6-44DA-B2C7-043BD0989CBF} key: {www.example.com/myExample/Xyz}ObjectName, vtype: <class 'str'>, value: MyTestProject key: {www.example.com/myExample/Xyz}VersionGUID, vtype: <class 'str'>, value: {8D9F7CDA-590E-44C3-8896-786D27167F7D} ========================================================================================== Child attributes Child0 child: <Element {www.example.com/myExample/Xyz}Property at 0x2ccbe08> Child0 attributes: {'{www.example.com/myExample/Xyz}Name': 'PackageFormatVersion'} key: {www.example.com/myExample/Xyz}Name, vtype: <class 'str'>, value: PackageFormatVersion ------------------------------------------------------------------------------------------ Child1 child: <Element {www.example.com/myExample/Xyz}ConnectionManagers at 0x2ccbe48> Child1 attributes: {} ------------------------------------------------------------------------------------------ Child2 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2ccbe88> Child2 attributes: {'{www.example.com/myExample/Xyz}refId': 'Package.ConnectionManagers[RTG093939BB.AdminDB]', '{www.example.com/myExample/Xyz}CreationName': 'OLEDB', '{www.example.com/myExample/Xyz}XYZID': '{C67B6283-781F-4B0E-A9A7-376A157B6F16}', '{www.example.com/myExample/Xyz}ObjectName': 'RTG093939BB.AdminDB'} key: {www.example.com/myExample/Xyz}refId, vtype: <class 'str'>, value: Package.ConnectionManagers[RTG093939BB.AdminDB] key: {www.example.com/myExample/Xyz}CreationName, vtype: <class 'str'>, value: OLEDB key: {www.example.com/myExample/Xyz}XYZID, vtype: <class 'str'>, value: {C67B6283-781F-4B0E-A9A7-376A157B6F16} key: {www.example.com/myExample/Xyz}ObjectName, vtype: <class 'str'>, value: RTG093939BB.AdminDB ------------------------------------------------------------------------------------------ Child3 child: <Element {www.example.com/myExample/Xyz}ObjectData at 0x2ccbec8> Child3 attributes: {} ------------------------------------------------------------------------------------------ Child4 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2ccbf08> Child4 attributes: {'{www.example.com/myExample/Xyz}ConnectionString': 'Data Source=RTG093939BB;Initial Catalog=AdminDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;'} key: {www.example.com/myExample/Xyz}ConnectionString, vtype: <class 'str'>, value: Data Source=RTG093939BB;Initial Catalog=AdminDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False; ------------------------------------------------------------------------------------------ Child5 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2ccbf88> Child5 attributes: {'{www.example.com/myExample/Xyz}refId': 'Package.ConnectionManagers[RTG093955XT.Stage]', '{www.example.com/myExample/Xyz}CreationName': 'OLEDB', '{www.example.com/myExample/Xyz}XYZID': '{8B4F57EA-03EA-49FA-B4BD-828A89FE5A32}', '{www.example.com/myExample/Xyz}ObjectName': 'RTG093955XT.Stage'} key: {www.example.com/myExample/Xyz}refId, vtype: <class 'str'>, value: Package.ConnectionManagers[RTG093955XT.Stage] key: {www.example.com/myExample/Xyz}CreationName, vtype: <class 'str'>, value: OLEDB key: {www.example.com/myExample/Xyz}XYZID, vtype: <class 'str'>, value: {8B4F57EA-03EA-49FA-B4BD-828A89FE5A32} key: {www.example.com/myExample/Xyz}ObjectName, vtype: <class 'str'>, value: RTG093955XT.Stage ------------------------------------------------------------------------------------------ Child6 child: <Element {www.example.com/myExample/Xyz}ObjectData at 0x2ccbfc8> Child6 attributes: {} ------------------------------------------------------------------------------------------ Child7 child: <Element {www.example.com/myExample/Xyz}ConnectionManager at 0x2cdb048> Child7 attributes: {'{www.example.com/myExample/Xyz}ConnectionString': 'Data Source=RTG093955XT;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;'} key: {www.example.com/myExample/Xyz}ConnectionString, vtype: <class 'str'>, value: Data Source=RTG093955XT;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False; ------------------------------------------------------------------------------------------
Reply
#7
Hello -

This looks great, amazing. I was going to try it out, but i need to find that XMLpaths module that you import/reference. I cannot seem to find it...which is strange to me. I was searching online (python module XMLpaths), but no results.

once i can find that module and import/install it...i will check this out...because you are right, it does look very promising and valuable.

thanks again
Reply
#8
oops, that's one of mine,
here is is:
from pathlib import Path

class XMLpaths:
    def __init__(self):
        self.homepath = Path('.')
        self.rootpath = self.homepath / ('..')
        self.docpath = self.rootpath / 'doc'
        self.docpath.mkdir(exist_ok=True)
        self.datapath = self.rootpath / 'data'
        self.datapath.mkdir(exist_ok=True)
        self.xmlpath = self.datapath / 'xml'
        self.xmlpath.mkdir(exist_ok=True)

if __name__ == '__main__':
    '''
    running standalone once will create any missing directory, and will not harm any existing directories 
    '''
    XMLpaths()
Reply
#9
Thanks. And, this works great. Provides a lot of valuable details.


thanks again!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  deep learning python Stevedas 1 1,649 Sep-26-2021, 08:32 AM
Last Post: Yoriz
  How can i generate hierarchy directory Anldra12 2 1,929 Jun-05-2021, 08:28 AM
Last Post: Anldra12
  convert List of Dicts into a 2 deep Nested Dict rethink 1 3,158 Aug-23-2019, 05:28 PM
Last Post: ichabod801
  [split] Teacher (thrown in at the deep end - help) Mr90 2 2,964 May-23-2018, 02:04 PM
Last Post: DeaD_EyE
  Teacher (thrown in at the deep end - help) Mr90 5 3,825 May-22-2018, 01:08 PM
Last Post: DeaD_EyE
  Relative import multiple levels ? Windspar 3 4,373 Feb-02-2018, 11:55 PM
Last Post: Windspar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020