Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Convert XML into DB table
#1
Hi,

I am new to Python and also new to this forum. Please forgive and correct me if you see any posting rules are violated in this post.

I am using Python version 3.8

I have a big XML file which contains the information about patient, service provider, insurance claim etc. This XML file contains many elements, nested child elements, attributes etc. in it. I want to convert this XML file into database table using Python script. This Python script should be able to read any number of XML file one by one and load data into DB table.

Below is the sample input XML file and output table

BEGIN

<?xml version="1.0" encoding="UTF-8"?>
<employee>
<fname>Christy</fname>
<lname>Charloette</lname>
<mail_address>
<add_line1>5052 Washington Street</add_line1>
<add_line2>Apt. 230</add_line2>
<city>New York</city>
<state>NY</state>
<zip_code>12345</zip_code>
<country>USA</country>
</mail_address>
<perm_address>
<add_line1>2598 Nolen Ave</add_line1>
<add_line2>Apt. 345</add_line2>
<city>New Jersy</city>
<state>NY</state>
<zip_code>54321</zip_code>
<country>USA</country>
</perm_address>
<expertise name="SQL"/>
<expertise name="Python"/>
<expertise name="Testing"/>
<expertise name="Business"/>
</employee>


Output table
------------
Elem_Key Elem_tag Value Parent_Elem
======== ======== ===== ===========
1001 employee
1002 fname Christy 1001
1003 lname Charloette 1001
1004 mail_address 1001
1005 add_line1 5052 Washing 1004
ton Street
1006 add_line2 Apt. 230 1004
1007 city New York 1004
1008 state NY 1004
1009 zip_code 12345 1004
1010 country USA 1004
1011 perm_address 1001
1012 add_line1 2598 Nolen Ave 1011
1013 add_line2 Apt. 345 1011
1014 city New Jersy 1011
1015 state NY 1011
1016 zip_code 54321 1011
1017 country USA 1011
1018 expertise SQL,Python, 1001
Testing,Busi
ness

END

Thanks!
Reply
#2
So, what have you tried so far? Show your code, even if not working.
Reply
#3
So far I didn't do much, just started playing with xml.etree.ElementTree package. Here is what I did on a smaller XML file.

BEGIN XML
<?xml version="1.0" encoding="UTF-8"?>
<employee>
<fname>Christy</fname>
<lname>Charloette</lname>
<home>USA</home>
<expertise name="SQL"/>
<expertise name="Python"/>
<expertise name="Testing"/>
<expertise name="Business"/>
</employee>
END XML

[font=Courier New, Courier, monospace][/font]import xml.etree.ElementTree as ET

tree = ET.parse('Myxml.xml')
root = tree.getroot()

print('Root tag: ', root.tag)
print('Root attr: ', root.attrib)

for child in root:
    print(child.tag, child.attrib)
print()

print([elem.tag for elem in root.iter()])
print()
OUTPUT
======
Root tag: employee
Root attr: {}
fname {}
lname {}
home {}
expertise {'name': 'SQL'}
expertise {'name': 'Python'}
expertise {'name': 'Testing'}
expertise {'name': 'Business'}

['employee', 'fname', 'lname', 'home', 'expertise', 'expertise', 'expertise', 'expertise']

Actually the XML is little more complex than what I showed in my first post. The element could have more than one attributes in one single row. For example: Element "expertise" have name, level and experience attributes in one single row.

BEGIN
<?xml version="1.0" encoding="UTF-8"?>
<employee>
<fname>Christy</fname>
<lname>Charloette</lname>
<mail_address>
<add_line1>5052 Washington Street</add_line1>
<add_line2>Apt. 230</add_line2>
<city>New York</city>
<state>NY</state>
<zip_code>12345</zip_code>
<country>USA</country>
</mail_address>
<perm_address>
<add_line1>2598 Nolen Ave</add_line1>
<add_line2>Apt. 345</add_line2>
<city>New Jersy</city>
<state>NY</state>
<zip_code>54321</zip_code>
<country>USA</country>
</perm_address>
<expertise name="SQL"/>
<expertise name="Python" level="Intermediate"/>
<expertise name="Testing" level="Advanced" experience="5"/>
<expertise name="Business"/>
</employee>


Output table
------------
Elem_Key Elem_tag Value Parent_Elem
======== ======== ===== ===========
1001 employee
1002 fname Christy 1001
1003 lname Charloette 1001
1004 mail_address 1001
1005 add_line1 5052 Washing 1004
ton Street
1006 add_line2 Apt. 230 1004
1007 city New York 1004
1008 state NY 1004
1009 zip_code 12345 1004
1010 country USA 1004
1011 perm_address 1001
1012 add_line1 2598 Nolen Ave 1011
1013 add_line2 Apt. 345 1011
1014 city New Jersy 1011
1015 state NY 1011
1016 zip_code 54321 1011
1017 country USA 1011
1018 expertise 1001
1019 name SQL 1018
1020 expertise 1001
1021 name Python 1020
1022 level Intermediate 1020
1023 expertise 1001
1024 name Testing 1023
1025 level Advanced 1023
1026 experience 5 1023
1027 expertise 1001
1028 name Business 1026

END
Reply
#4
I am trying to understand below piece of code which I copied from internet. I did not understand one behavior of this code which I noted it in output section at the bottom (Highlighted).

To execute this code you need to copy below XML content in "Myxml.xml" file and locate it in directory where python can read it.

Begin XML
<?xml version="1.0" encoding="UTF-8"?>
<employee>
<fname>Christy</fname>
<lname>Charloette</lname>
</employee>
End XML

#!/usr/env/python3

''' Python script to parse the xml file
'''

import xml.dom
from xml.dom import Node
from xml.dom.minidom import parse
import sys

def strip_space(node):
    print('In strip_space...')
    child = node.firstChild
    print('node.firstChild: ', node.firstChild)
    while child != None:
        print('-> In while loop...')
        c = child.nextSibling
        if (child.nodeType == Node.TEXT_NODE and
            len(child.nodeValue.strip()) == 0):
            print('---> If Statement:')
            node.removeChild(child)
        else:
            print('---> Else Statement')
            strip_space(child)
        print('Next sibling: ', c)
        child = c
    return node

def compact(node,indent):
    if node == None: return
    if node.nodeType == Node.ELEMENT_NODE:
        sys.stdout.write(node.nodeName+'[')
        indent += (len(node.nodeName)+1)*" "
        attrs = node.attributes
        first = True
        for i in range(len(attrs)):
            if not first: sys.stdout.write('\n'+indent)
            sys.stdout.write('@'+attrs.item(i).nodeName +
                             '['+attrs.item(i).nodeValue+']')
            first = False
        child = node.firstChild
        while child != None:
            if not first: sys.stdout.write('\n'+indent)
            compact(child,indent)
            first = False
            child = child.nextSibling
        sys.stdout.write(']')
    elif node.nodeType == Node.TEXT_NODE:
        sys.stdout.write(node.nodeValue.strip())

doc = parse('Myxml.xml')
compact(strip_space(doc.documentElement),"")
Output
------
In strip_space...
node.firstChild: <DOM Text node "'\n '">
-> In while loop...
---> If Statement:
Next sibling: <DOM Element: fname at 0x1c136a0>
-> In while loop...
---> Else Statement
In strip_space...
node.firstChild: <DOM Text node "'Christy'"> #-> 1. PROGRAM FOUND "Christy" TEXT NODE
-> In while loop... #-> 2. CONTROL ENTERED INTO WHILE LOOP
---> Else Statement #-> 3. CONTROL ENTERED INTO ELSE PART
In strip_spaceā€¦ #-> 4. CONTROL RECURSIVELY CALLED SAME FUNCTION AGAIN FROM ELSE PART
node.firstChild: None #-> 5. "Christy" TEXT NODE DO NOT HAVE ANY CHILD, SO
# THE VALUE OF VARIABLE "Child" IS NONE
Next sibling: None #-> 6. CONTROL WENT INTO WHILE LOOP AND PRINT "None"
Next sibling: <DOM Text node "'\n '"> #-> 7. CONTROL AGAIN WENT INTO WHILE LOOP AND
#-> PRINT NEXT NODE OF "Christy"

Questions:
1. At step 5, variable "Child" is None. Why it entered into while loop and print message "Next sibling: None" at step 6? At the same time, it did not print message "In while loop..." which is the first message in while loop, why?
2. After Step 6, how the control can entered into while loop and print message ### Next sibling: <DOM Text node "'\n '"> ### when variable child was equal to "None". Also, it did not print message "In while loop...", Why?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Thumbs Up Convert word into pdf and copy table to outlook body in a prescribed format email2kmahe 1 705 Sep-22-2023, 02:33 PM
Last Post: carecavoador
  Convert Json to table format python_student 2 5,063 Sep-28-2022, 12:48 PM
Last Post: python_student
  Convert .xlsx to Format as Table bnadir55 0 860 Aug-11-2022, 06:39 AM
Last Post: bnadir55
  Yahoo_fin, Pandas: how to convert data table structure in csv file detlefschmitt 14 7,559 Feb-15-2021, 12:58 PM
Last Post: detlefschmitt
  Convert table in pandas tgottsc1 2 1,944 Jan-26-2021, 01:58 PM
Last Post: tgottsc1
  convert text file data to HTML table in python Tirumal 5 12,406 Dec-29-2017, 04:44 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020