Python Forum
Best way to process large/complex XML/schema ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Best way to process large/complex XML/schema ?
#7
(May-15-2021, 06:14 PM)snippsat Wrote: Here a example of how i would read it and parse some data.
from bs4 import BeautifulSoup

soup = BeautifulSoup(open('W2Testfile.xml', encoding='utf-8'), 'lxml')

# Just copy from and doc and lower search or write lower case
sub_id = soup.find('SubmissionId'.lower())
# Tag and text
print(sub_id)
print(sub_id.text)

#---| Take out  part eg doc 2,then <find_all> of a tag that there are several of
doc_2 = soup.find('returndata', {'documentcnt': '2'})
dep_detail =  doc_2.find_all('DependentDetail'.lower())
print('-' * 30)
print(dep_detail[0].find('dependentrelationshipcd'))
print(dep_detail[0].find('dependentrelationshipcd').text)
Output:
<submissionid>00000000000000002222</submissionid> 00000000000000002222 ------------------------------ <dependentrelationshipcd>SON</dependentrelationshipcd> SON

Thanks for your guidance.

As I mentioned this is big XML, if I go with above element by element with explicit navigation. it is a hard task to pull it up.

We may have 45K to 50K xml elements to traverse this way.

Is lxml work as DOM serial parsing ? How will address of pulling all this big XML into DOM?

Is there a way to pull elements using XPATH in lxml?

Are there any options in Python do parallel parsing like SAX (Java)?

Thanks for your guidance.
Reply


Messages In This Thread
RE: Best way to process large/complex XML/schema ? - by MDRI - May-16-2021, 05:11 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  __init__() got multiple values for argument 'schema' dawid294 4 2,605 Jan-03-2024, 09:42 AM
Last Post: buran
  how to catch schema error? maiya 0 1,889 Jul-16-2021, 08:37 AM
Last Post: maiya
  Missing Schema-Python Question Andwconteh 1 2,548 Jun-16-2021, 01:00 PM
Last Post: Andwconteh
  How to sharing object between multiple process from main process using Pipe Subrata 1 3,696 Sep-03-2019, 09:49 PM
Last Post: woooee
  Avoid output buffering when redirecting large data (40KB) to another process Ramphic 3 3,447 Mar-10-2018, 04:49 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020