How do I get full XPath extract using Python?

MDRI · Jul-03-2020, 08:43 PM

How do I get full XPath extract using Python?
=====================================

Thanks for your response to my threads.

I am trying to use the pyhton code below. I am getting Abbreviated XPATH instead of FULL xpath. Exclamation

What are the changes required to the code to get FULL XPATH?

from lxml import etree, objectify

def parseXML(xmlFile, outputFile):
    """
    Parse the XML function
    """
    with open(xmlFile) as fobj:
        xml = fobj.read()

    f = open(outputFile,'w') #open write to file
    root = etree.fromstring(xml)

    f.write("%s|%s\n" %("Field", "Value"))
    tree = etree.ElementTree(root)
    for e in root.iter():
        f.write("%s|%s\n" %(tree.getpath(e), e.text))

    f.close()

if __name__ == "__main__":
    print ('Loading variables...')
    input = 'inputf.xml'
    output = input + '.csv'

    parseXML(input,output)

I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in above posted code.

Output: INPUTXML


    <?xml version="1.0" encoding="UTF-8"?>
      <DataFileFor>
        <DataR>
           <Id>5070022019330a0050hq</Id>
             <NUM>30221730001019</NUM>
             <Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
             <TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>

++++

When grab the XPATH of Node using xml_grep, I am getting.

xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output

Output:xml_grep DataFileFor/DataR/Ret/W2 inputf.xml 

<?xml version="1.0" ?>

<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">

<file filename="inputf.xml">

  <W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">

    <CorrectedW2Ind>X</CorrectedW2Ind>

    <EmployeeSSN>000000000</EmployeeSSN>

    <EmployerEIN>000000000</EmployerEIN>

    <EmployerNameControlTxt>S</EmployerNameControlTxt>

    <EmployerName>

      <BusinessNameLine1Txt>String</BusinessNameLine1Txt>

      <BusinessNameLine2Txt>String</BusinessNameLine2Txt>

    </EmployerName>

    <EmployerUSAddress>

      <AddressLine1Txt>String</AddressLine1Txt>

      <AddressLine2Txt>String</AddressLine2Txt>

      <CityNm>String</CityNm>

      <StateAbbreviationCd>AL</StateAbbreviationCd>

      <ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>

When I use this code, it is producing Abbreviated Xpaths instead of full XPath. The output XPATHS are like

Output:[output]/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String

[/output]

What are the changes required to the code to get FULL XPATH?

The attributes

Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output

What are the changes required to the code, to fix this?

Thanks for your guidance.

MDRI · Sep-18-2020, 02:13 AM

Any thoughts ?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python extract	mg24	1	955	Nov-02-2022, 06:30 PM Last Post: Larz60+
	IWhat is the cause to get XPath in weird format using Python?	MDRI	7	3,691	May-27-2021, 02:01 AM Last Post: MDRI
	How to append a tuple full of records to a dbf file in Python?	DarkCoder2020	4	3,747	May-29-2020, 02:40 PM Last Post: DarkCoder2020
	Need help to correct my python function for fetching full data!	PrateekG	2	2,917	May-27-2018, 06:39 AM Last Post: PrateekG

How do I get full XPath extract using Python?

User Panel Messages

Announcements