Python Forum
Copy xml content from webpage and save to locally without special characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Copy xml content from webpage and save to locally without special characters
#1
I land up in a .xml webpage which is created based on my earlier inputs(<//aep2/xml/trace/NIK243164_AI_14652732.xml>). I want to read the entire content of this xml, copy it locally and compare with an existing baseline Xml.

I've tried the below options, wherein each option has its own troubles:

1)

with open('test.xml', 'w') as f:
f.write(driver.page_source)

This extracts the info from the webpage but add html tags to start-end and also special characters to each tag. Is there a built-in function to automatically convert pagesource content to xml?

2)

headers = {'User-Agent': 'Mozilla'} request = urllib.Request(url1, headers=headers) response = urllib.urlopen(request) print(response.status_code)

if response.status_code == 200: with open('test.xml', 'wb') as f: shutil.copyfileobj(response.text, f)

This is unable to read from the Xml. I get a user-defined error returned 'there was an error generating the xml'. However, the xml is very much generated after my each script as I'm not running the cases headless at the moment.

3)

urllib.request.urlretrieve(url1, "test.xml")

Similar issue as seen in 2)

Please help with a solution to read a webpage that has .xml extension, copy just the content and write it to another file.

Webpage content looks like below:

<?xml version='1.0' encoding='UTF-8'?><IM415 xmlns="http://www.ros.ie/schemas/customs/IM415"> <Declaration> <MsgType>H1</MsgType> <DeclarationType_1_1>IM</DeclarationType_1_1> <AdditionalDeclarationType_1_2>A</AdditionalDeclarationType_1_2> <LRN_2_5>NIK243172_16K0O3</LRN_2_5> <ValuationInformation> <InvoiceCurrency_4_10>EUR</InvoiceCurrency_4_10> <InvoiceAmount_4_11>5000</InvoiceAmount_4_11> <InternalCurrency_4_12>EUR</InternalCurrency_4_12> </ValuationInformation> <GoodsInformation> <GrossMass_6_5>300</GrossMass_6_5> <TotalPackageNumber_6_18>15</TotalPackageNumber_6_18> </GoodsInformation> <TransportInformation> <BorderTransportMode_7_4>3</BorderTransportMode_7_4> <ActiveBorderTransportMeansNationality_7_15>IE</ActiveBorderTransportMeansNationality_7_15> </TransportInformation> <CustomsOffices> <PresentationCustomsOffice_5_26>IEDUB100</PresentationCustomsOffice_5_26> <CustomsOfficeLodgement>IEDUB100</CustomsOfficeLodgement> </CustomsOffices> <Parties> <Declarant> <Declarant_3_18 xmlns="">IE8218454B</Declarant_3_18> </Declarant> <Representative> <Representative_3_20 xmlns="">IE8218454B</Representative_3_20> </Representative> <PersonPayingCustomsDuty_3_46>IE9726356R</PersonPayingCustomsDuty_3_46> </Parties> <PreferredPaymentMethod_4_8>E</PreferredPaymentMethod_4_8> </Declaration> <GoodsShipment> <DocumentsAuthorisations> <AdditionalInformation_2_2> <AdditionalInformationCode xmlns="">00500</AdditionalInformationCode> </AdditionalInformation_2_2> <ProducedDocumentsWritingOff_2_03> <DocumentType xmlns="">1D24</DocumentType> <DocumentIdentifier xmlns="">202403211000</DocumentIdentifier> </ProducedDocumentsWritingOff_2_03> <ProducedDocumentsWritingOff_2_03> <DocumentType xmlns="">1D94</DocumentType> <DocumentIdentifier xmlns="">9214991</DocumentIdentifier> </ProducedDocumentsWritingOff_2_03> <ProducedDocumentsWritingOff_2_03>
Reply


Messages In This Thread
Copy xml content from webpage and save to locally without special characters - by Nik1811 - Mar-21-2024, 03:45 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Why is the copy method name in python list copy and not `__copy__`? YouHoGeon 2 310 Apr-04-2024, 01:18 AM
Last Post: YouHoGeon
  how to save to multiple locations during save cubangt 1 590 Oct-23-2023, 10:16 PM
Last Post: deanhystad
Question Special Characters read-write Prisonfeed 1 660 Sep-17-2023, 08:26 PM
Last Post: Gribouillis
  UPDATE SQLITE TABLE - Copy a fields content to another field. andrewarles 14 4,492 May-08-2021, 04:58 PM
Last Post: ibreeden
  Rename Multiple files in directory to remove special characters nyawadasi 9 6,516 Feb-16-2021, 09:49 PM
Last Post: BashBedlam
  copy content of text file with three delimiter into excel sheet vinaykumar 0 2,385 Jul-12-2020, 01:27 PM
Last Post: vinaykumar
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,884 May-15-2020, 01:37 PM
Last Post: snippsat
  Check for a special characters in a column and flag it ayomayam 0 2,073 Feb-12-2020, 03:04 PM
Last Post: ayomayam
  save content of table into file atlass218 10 10,031 Aug-28-2019, 12:12 PM
Last Post: Gribouillis
  Split pyscaffold project into packages locally mucrom 0 1,521 Aug-05-2019, 12:07 PM
Last Post: mucrom

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020