Jun-27-2020, 02:36 AM
Thanks for your response to my threads.
I am trying to use the pyhton code below. I am getting weird output.
Python Code
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
<Element DataFileFor at 0x2779a00>
NO = 30221730001019
It is working for small file. My expectation is same as xml_grep output.
I am strugling to find why xml_grep able to extract the xpath for this file, but posted code is not able to do it.
The attributes
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
What are the changes required to the code, to fix this?
Thanks for your guidance.
I am trying to use the pyhton code below. I am getting weird output.
Python Code
def Process_XML(infile, inxpath, xpathdln): """ Process XML Xpath """ import xml.etree.ElementTree as ET from lxml import etree tree = etree.parse(infile) root = tree.getroot() print(root) for dln in tree.xpath(xpathdln): # Iterate over attributes of datafield print(dln.tag + ' = ' + dln.text) for df in tree.xpath(inxpath): # Iterate over attributes of datafield print(df.tag + ' = ' + df.text) for attrib_name in df.attrib: print( '@' + attrib_name + '=' + df.attrib[attrib_name]) # subfield is a child of datafield, and iterate subfields = df.getchildren() for subfield in subfields: print (subfield.tag + ' = ' + subfield.text) return; infile1 = 'D:\Python_work\eclipse-workspace\My1stPythonP\input.xml' inxpath1 = '/DataFileFor/DataR/Ret/W2' xpathdln = '/DataFileFor/DataR/NO' Process_XML(infile1, inxpath1, xpathdln)I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in posted code
Output: <?xml version="1.0" encoding="UTF-8"?>
<DataFileFor>
<DataR>
<Id>5070022019330a0050hq</Id>
<NUM>30221730001019</NUM>
<Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
<TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>
When grab the XPATH of Node using xml_grep, I am getting.xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
Output:<?xml version="1.0" ?>
<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">
<file filename="inputf.xml">
<W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">
<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>S</EmployerNameControlTxt>
<EmployerName>
<BusinessNameLine1Txt>String</BusinessNameLine1Txt>
<BusinessNameLine2Txt>String</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
<AddressLine1Txt>String</AddressLine1Txt>
<AddressLine2Txt>String</AddressLine2Txt>
<CityNm>String</CityNm>
<StateAbbreviationCd>AL</StateAbbreviationCd>
<ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>
When run the code , I am getting<Element DataFileFor at 0x2779a00>
NO = 30221730001019
It is working for small file. My expectation is same as xml_grep output.
I am strugling to find why xml_grep able to extract the xpath for this file, but posted code is not able to do it.
The attributes
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
What are the changes required to the code, to fix this?
Thanks for your guidance.