Best way to process large/complex XML/schema ?

MDRI · May-16-2021, 05:11 PM

(May-15-2021, 06:14 PM)snippsat Wrote: Here a example of how i would read it and parse some data.

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('W2Testfile.xml', encoding='utf-8'), 'lxml')

# Just copy from and doc and lower search or write lower case
sub_id = soup.find('SubmissionId'.lower())
# Tag and text
print(sub_id)
print(sub_id.text)

#---| Take out  part eg doc 2,then <find_all> of a tag that there are several of
doc_2 = soup.find('returndata', {'documentcnt': '2'})
dep_detail =  doc_2.find_all('DependentDetail'.lower())
print('-' * 30)
print(dep_detail[0].find('dependentrelationshipcd'))
print(dep_detail[0].find('dependentrelationshipcd').text)

Output:<submissionid>00000000000000002222</submissionid>
00000000000000002222
------------------------------
<dependentrelationshipcd>SON</dependentrelationshipcd>
SON

Thanks for your guidance.

As I mentioned this is big XML, if I go with above element by element with explicit navigation. it is a hard task to pull it up.

We may have 45K to 50K xml elements to traverse this way.

Is lxml work as DOM serial parsing ? How will address of pulling all this big XML into DOM?

Is there a way to pull elements using XPATH in lxml?

Are there any options in Python do parallel parsing like SAX (Java)?

Thanks for your guidance.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	__init__() got multiple values for argument 'schema'	dawid294	4	2,605	Jan-03-2024, 09:42 AM Last Post: buran
	how to catch schema error?	maiya	0	1,889	Jul-16-2021, 08:37 AM Last Post: maiya
	Missing Schema-Python Question	Andwconteh	1	2,548	Jun-16-2021, 01:00 PM Last Post: Andwconteh
	How to sharing object between multiple process from main process using Pipe	Subrata	1	3,696	Sep-03-2019, 09:49 PM Last Post: woooee
	Avoid output buffering when redirecting large data (40KB) to another process	Ramphic	3	3,447	Mar-10-2018, 04:49 AM Last Post: Larz60+

Best way to process large/complex XML/schema ?

User Panel Messages

Announcements