Jul-03-2020, 08:43 PM
How do I get full XPath extract using Python?
=====================================
Thanks for your response to my threads.
I am trying to use the pyhton code below. I am getting Abbreviated XPATH instead of FULL xpath.
What are the changes required to the code to get FULL XPATH?
When grab the XPATH of Node using xml_grep, I am getting.
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
What are the changes required to the code to get FULL XPATH?
The attributes
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
What are the changes required to the code, to fix this?
Thanks for your guidance.
=====================================
Thanks for your response to my threads.
I am trying to use the pyhton code below. I am getting Abbreviated XPATH instead of FULL xpath.

from lxml import etree, objectify def parseXML(xmlFile, outputFile): """ Parse the XML function """ with open(xmlFile) as fobj: xml = fobj.read() f = open(outputFile,'w') #open write to file root = etree.fromstring(xml) f.write("%s|%s\n" %("Field", "Value")) tree = etree.ElementTree(root) for e in root.iter(): f.write("%s|%s\n" %(tree.getpath(e), e.text)) f.close() if __name__ == "__main__": print ('Loading variables...') input = 'inputf.xml' output = input + '.csv' parseXML(input,output)I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in above posted code.
Output: INPUTXML
<?xml version="1.0" encoding="UTF-8"?>
<DataFileFor>
<DataR>
<Id>5070022019330a0050hq</Id>
<NUM>30221730001019</NUM>
<Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
<TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>
++++When grab the XPATH of Node using xml_grep, I am getting.
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
Output:xml_grep DataFileFor/DataR/Ret/W2 inputf.xml
<?xml version="1.0" ?>
<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">
<file filename="inputf.xml">
<W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">
<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>S</EmployerNameControlTxt>
<EmployerName>
<BusinessNameLine1Txt>String</BusinessNameLine1Txt>
<BusinessNameLine2Txt>String</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
<AddressLine1Txt>String</AddressLine1Txt>
<AddressLine2Txt>String</AddressLine2Txt>
<CityNm>String</CityNm>
<StateAbbreviationCd>AL</StateAbbreviationCd>
<ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>
When I use this code, it is producing Abbreviated Xpaths instead of full XPath. The output XPATHS are likeOutput:[output]/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String
[/output]What are the changes required to the code to get FULL XPATH?
The attributes
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
What are the changes required to the code, to fix this?
Thanks for your guidance.