Python Forum

Hello All -

I am trying to parse some xml files that are saturated with Namespaces. I have been able to parse XML files without namespaces. I have found several articles online on parsing with Namespaces, but the namespaces in my file don't seem like most of the examples I have found online.

Below is an example of the XML. I am trying to get to the ConnectionString property.

I have tried using findall with the full hierarchy like

conn_mgrs = root.findall('ConnectionManagers/ConnectionManager/ObjectData/ConnectionManager')

And, also using the namespace argument like:

ns = {'XYZ': 'www.example.com/myExample/Xyz'}
conn_mgrs = root.findall('ConnectionManagers/ConnectionManager/ObjectData/ConnectionManager', ns)

Both just return a null element.

My next move is to probably strip out the namespace prefixes and then parse the file, but figured i'd check with others to see if someone knows a way to resolve.

Thanks for any help

The XML looks like this:

<XYZ:Executable xmlns:XYZ="www.example.com/myExample/Xyz"
  XYZ:Id="Package"
  XYZ:CreationDate="2/21/2018 11:11:48 AM"
  XYZ:XYZID="{FB8BE06B-76B6-44DA-B2C7-043BD0989CBF}"
  XYZ:ObjectName="MyTestProject"
  XYZ:VersionGUID="{8D9F7CDA-590E-44C3-8896-786D27167F7D}">
  <XYZ:Property
    XYZ:Name="PackageFormatVersion">6</XYZ:Property>
  <XYZ:ConnectionManagers>
    <XYZ:ConnectionManager
      XYZ:refId="Package.ConnectionManagers[RTG093939BB.AdminDB]"
      XYZ:CreationName="OLEDB"
      XYZ:XYZID="{C67B6283-781F-4B0E-A9A7-376A157B6F16}"
      XYZ:ObjectName="RTG093939BB.AdminDB">
      <XYZ:ObjectData>
        <XYZ:ConnectionManager
          XYZ:ConnectionString="Data Source=RTG093939BB;Initial Catalog=AdminDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
      </XYZ:ObjectData>
    </XYZ:ConnectionManager>
    <XYZ:ConnectionManager
      XYZ:refId="Package.ConnectionManagers[RTG093955XT.Stage]"
      XYZ:CreationName="OLEDB"
      XYZ:XYZID="{8B4F57EA-03EA-49FA-B4BD-828A89FE5A32}"
      XYZ:ObjectName="RTG093955XT.Stage">
      <XYZ:ObjectData>
        <XYZ:ConnectionManager
          XYZ:ConnectionString="Data Source=RTG093955XT;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
      </XYZ:ObjectData>
    </XYZ:ConnectionManager>
  </XYZ:ConnectionManagers>
 </XYZ:Executable>

I've had some limited success parsing your xml with etree,
but having difficulty with 'connection managers' info as
it shows up as a dictionary without any keys.

Here's what I did:

copied your XML into a file named 'ziggy.xml'
you can change the name to whatever you want in the testit function
Started parsing nodes, had no problem with the root node, or the Property node
Dictionary problem (stated previously) shows up with ConnectionManager node.

Play with it, perhaps you can figure it out, I need a break.

The code:

import xml.etree.ElementTree as et

class ParseXmlWithNamespace:
    def __init__(self, xml_filename):
        self.tree = et.parse('ziggy.xml')
        self.root = self.tree.getroot()
        self.show_root_info()
        self.show_child_info()
        self.parser = et.XMLPullParser(['start', 'end'])

    def show_root_info(self):
        for item in self.root.items():
            for n, field in enumerate(item):
                if n == 0:
                    p = field.index('}')
                    print(f'{field[p+1:] :20}: ', end='')
                else:
                    print(field)

    def show_child_info(self):
        root = self.root
        for child in root:
            print(f'\ntag type: {type(child.tag)}')
            print(f'tag value: {child.tag}')
            print(f'attrib type: {type(child.attrib)}')
            print(f'attrib value: {child.attrib}')
            if isinstance(child.attrib, dict):
                print(f'    attrib keys: {child.attrib.keys()}')
            else:
                print(f'    attrib: {child.attrib}')

def tryit():
    px = ParseXmlWithNamespace('ziggy.xml')

if __name__ == '__main__':
    tryit()

results so far:

Output:Id                  : Package
CreationDate        : 2/21/2018 11:11:48 AM
XYZID               : {FB8BE06B-76B6-44DA-B2C7-043BD0989CBF}
ObjectName          : MyTestProject
VersionGUID         : {8D9F7CDA-590E-44C3-8896-786D27167F7D}

tag type: <class 'str'>
tag value: {www.example.com/myExample/Xyz}Property
attrib type: <class 'dict'>
attrib value: {'{www.example.com/myExample/Xyz}Name': 'PackageFormatVersion'}
    attrib keys: dict_keys(['{www.example.com/myExample/Xyz}Name'])

tag type: <class 'str'>
tag value: {www.example.com/myExample/Xyz}ConnectionManagers
attrib type: <class 'dict'>
attrib value: {}
    attrib keys: dict_keys([])

Hi -
Thanks so much for the help. I will give this a try and update this post as things happen.

thanks...

(Apr-11-2018, 09:51 PM)dwill Wrote: [ -> ]And, also using the namespace argument like:
ns = {'XYZ': 'www.example.com/myExample/Xyz'}
conn_mgrs = root.findall('ConnectionManagers/ConnectionManager/ObjectData/ConnectionManager', ns)
Both just return a null element.

In this code, you're defining your namespace, but you're not actually using it.
This gets the elements you want:

>>> root.findall('XYZ:ConnectionManagers/XYZ:ConnectionManager/XYZ:ObjectData/XYZ:ConnectionManager', ns)
[<Element '{www.example.com/myExample/Xyz}ConnectionManager' at 0x000001FADD450818>, <Element '{www.example.com/myExample/Xyz}ConnectionManager' at 0x000001FADD450908>]

I'd also suggest using lxml instead of the builtin xml.etree, as it will give you full XPath support, and is also much faster.

Hello -
Thanks so much. I just did a quick test and this works great. I will try it on a few other parts of the xml file as well.

One question, as a test, I changed to using lxml and the same code works. But, you also suggested using xpath for my searching/parsing. In my example, would it be something like:

conn_mgrs = root.xpath('XYZ:ConnectionManagers/XYZ:ConnectionManager/XYZ:ObjectData/XYZ:ConnectionManager', namespaces=ns)

I did try that and the output was the same as using the builtin xml.etree. Just wanted to make sure I was understanding your advice.

thank you!

The difference is that the built-in module only lets you use a subset of XPath, making certain things more complicated, or impossible.
For example, to get the ConnectionString attribute in lxml, you can simply do this:

>>> root.xpath('XYZ:ConnectionManagers/XYZ:ConnectionManager/XYZ:ObjectData/XYZ:ConnectionManager/@XYZ:ConnectionString', namespaces=ns)
['Data Source=RTG093939BB;Initial Catalog=AdminDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;', 'Data Source=RTG093955XT;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;']

Hi -

This is much better than what I was using. I had for loop and was getting each connection string, but I was navigating down the xml hierarchy. Your method gets right to the place I need, and because there could be multiple, at least my For loop is much smaller.

I am trying to also get one of the attributes, "Initial Catalog". I am close, but still missing something. I will post my code so you can see what I am trying and if it makes sense...in a pythonic world.

Thank you again for all of your help

Hi -
So, this is what I ended up with to get two of the items from this connection manager list:

cnxn_string = root.xpath('XYZ:ConnectionManagers/XYZ:ConnectionManager/XYZ:ObjectData/XYZ:ConnectionManager/@XYZ:ConnectionString', namespaces=ns)
 # print(cnxn_string)
 for item in cnxn_string:
    # print(item)
    c = item.split(';')
    for q in c:
        # print(q)
        if q.startswith('Data Source'):
            ds = q[q.find('=') + 1:]
            print(ds)
        if q.startswith('Initial Catalog'):
            ic = q[q.find('=') + 1:]
            print(ic)

Output:

Output:RTG093939BB
AdminDB
RTG093955XT
Stage

dwill

Larz60+

dwill

stranac

dwill

stranac

dwill

dwill