May-16-2021, 05:11 PM
(May-15-2021, 06:14 PM)snippsat Wrote: Here a example of how i would read it and parse some data.
from bs4 import BeautifulSoup soup = BeautifulSoup(open('W2Testfile.xml', encoding='utf-8'), 'lxml') # Just copy from and doc and lower search or write lower case sub_id = soup.find('SubmissionId'.lower()) # Tag and text print(sub_id) print(sub_id.text) #---| Take out part eg doc 2,then <find_all> of a tag that there are several of doc_2 = soup.find('returndata', {'documentcnt': '2'}) dep_detail = doc_2.find_all('DependentDetail'.lower()) print('-' * 30) print(dep_detail[0].find('dependentrelationshipcd')) print(dep_detail[0].find('dependentrelationshipcd').text)
Output:<submissionid>00000000000000002222</submissionid> 00000000000000002222 ------------------------------ <dependentrelationshipcd>SON</dependentrelationshipcd> SON
Thanks for your guidance.
As I mentioned this is big XML, if I go with above element by element with explicit navigation. it is a hard task to pull it up.
We may have 45K to 50K xml elements to traverse this way.
Is lxml work as DOM serial parsing ? How will address of pulling all this big XML into DOM?
Is there a way to pull elements using XPATH in lxml?
Are there any options in Python do parallel parsing like SAX (Java)?
Thanks for your guidance.