Mar-21-2024, 03:45 PM
I land up in a .xml webpage which is created based on my earlier inputs(<//aep2/xml/trace/NIK243164_AI_14652732.xml>). I want to read the entire content of this xml, copy it locally and compare with an existing baseline Xml.
I've tried the below options, wherein each option has its own troubles:
1)
with open('test.xml', 'w') as f:
f.write(driver.page_source)
This extracts the info from the webpage but add html tags to start-end and also special characters to each tag. Is there a built-in function to automatically convert pagesource content to xml?
2)
headers = {'User-Agent': 'Mozilla'} request = urllib.Request(url1, headers=headers) response = urllib.urlopen(request) print(response.status_code)
if response.status_code == 200: with open('test.xml', 'wb') as f: shutil.copyfileobj(response.text, f)
This is unable to read from the Xml. I get a user-defined error returned 'there was an error generating the xml'. However, the xml is very much generated after my each script as I'm not running the cases headless at the moment.
3)
urllib.request.urlretrieve(url1, "test.xml")
Similar issue as seen in 2)
Please help with a solution to read a webpage that has .xml extension, copy just the content and write it to another file.
Webpage content looks like below:
<?xml version='1.0' encoding='UTF-8'?><IM415 xmlns="http://www.ros.ie/schemas/customs/IM415"> <Declaration> <MsgType>H1</MsgType> <DeclarationType_1_1>IM</DeclarationType_1_1> <AdditionalDeclarationType_1_2>A</AdditionalDeclarationType_1_2> <LRN_2_5>NIK243172_16K0O3</LRN_2_5> <ValuationInformation> <InvoiceCurrency_4_10>EUR</InvoiceCurrency_4_10> <InvoiceAmount_4_11>5000</InvoiceAmount_4_11> <InternalCurrency_4_12>EUR</InternalCurrency_4_12> </ValuationInformation> <GoodsInformation> <GrossMass_6_5>300</GrossMass_6_5> <TotalPackageNumber_6_18>15</TotalPackageNumber_6_18> </GoodsInformation> <TransportInformation> <BorderTransportMode_7_4>3</BorderTransportMode_7_4> <ActiveBorderTransportMeansNationality_7_15>IE</ActiveBorderTransportMeansNationality_7_15> </TransportInformation> <CustomsOffices> <PresentationCustomsOffice_5_26>IEDUB100</PresentationCustomsOffice_5_26> <CustomsOfficeLodgement>IEDUB100</CustomsOfficeLodgement> </CustomsOffices> <Parties> <Declarant> <Declarant_3_18 xmlns="">IE8218454B</Declarant_3_18> </Declarant> <Representative> <Representative_3_20 xmlns="">IE8218454B</Representative_3_20> </Representative> <PersonPayingCustomsDuty_3_46>IE9726356R</PersonPayingCustomsDuty_3_46> </Parties> <PreferredPaymentMethod_4_8>E</PreferredPaymentMethod_4_8> </Declaration> <GoodsShipment> <DocumentsAuthorisations> <AdditionalInformation_2_2> <AdditionalInformationCode xmlns="">00500</AdditionalInformationCode> </AdditionalInformation_2_2> <ProducedDocumentsWritingOff_2_03> <DocumentType xmlns="">1D24</DocumentType> <DocumentIdentifier xmlns="">202403211000</DocumentIdentifier> </ProducedDocumentsWritingOff_2_03> <ProducedDocumentsWritingOff_2_03> <DocumentType xmlns="">1D94</DocumentType> <DocumentIdentifier xmlns="">9214991</DocumentIdentifier> </ProducedDocumentsWritingOff_2_03> <ProducedDocumentsWritingOff_2_03>
I've tried the below options, wherein each option has its own troubles:
1)
with open('test.xml', 'w') as f:
f.write(driver.page_source)
This extracts the info from the webpage but add html tags to start-end and also special characters to each tag. Is there a built-in function to automatically convert pagesource content to xml?
2)
headers = {'User-Agent': 'Mozilla'} request = urllib.Request(url1, headers=headers) response = urllib.urlopen(request) print(response.status_code)
if response.status_code == 200: with open('test.xml', 'wb') as f: shutil.copyfileobj(response.text, f)
This is unable to read from the Xml. I get a user-defined error returned 'there was an error generating the xml'. However, the xml is very much generated after my each script as I'm not running the cases headless at the moment.
3)
urllib.request.urlretrieve(url1, "test.xml")
Similar issue as seen in 2)
Please help with a solution to read a webpage that has .xml extension, copy just the content and write it to another file.
Webpage content looks like below:
<?xml version='1.0' encoding='UTF-8'?><IM415 xmlns="http://www.ros.ie/schemas/customs/IM415"> <Declaration> <MsgType>H1</MsgType> <DeclarationType_1_1>IM</DeclarationType_1_1> <AdditionalDeclarationType_1_2>A</AdditionalDeclarationType_1_2> <LRN_2_5>NIK243172_16K0O3</LRN_2_5> <ValuationInformation> <InvoiceCurrency_4_10>EUR</InvoiceCurrency_4_10> <InvoiceAmount_4_11>5000</InvoiceAmount_4_11> <InternalCurrency_4_12>EUR</InternalCurrency_4_12> </ValuationInformation> <GoodsInformation> <GrossMass_6_5>300</GrossMass_6_5> <TotalPackageNumber_6_18>15</TotalPackageNumber_6_18> </GoodsInformation> <TransportInformation> <BorderTransportMode_7_4>3</BorderTransportMode_7_4> <ActiveBorderTransportMeansNationality_7_15>IE</ActiveBorderTransportMeansNationality_7_15> </TransportInformation> <CustomsOffices> <PresentationCustomsOffice_5_26>IEDUB100</PresentationCustomsOffice_5_26> <CustomsOfficeLodgement>IEDUB100</CustomsOfficeLodgement> </CustomsOffices> <Parties> <Declarant> <Declarant_3_18 xmlns="">IE8218454B</Declarant_3_18> </Declarant> <Representative> <Representative_3_20 xmlns="">IE8218454B</Representative_3_20> </Representative> <PersonPayingCustomsDuty_3_46>IE9726356R</PersonPayingCustomsDuty_3_46> </Parties> <PreferredPaymentMethod_4_8>E</PreferredPaymentMethod_4_8> </Declaration> <GoodsShipment> <DocumentsAuthorisations> <AdditionalInformation_2_2> <AdditionalInformationCode xmlns="">00500</AdditionalInformationCode> </AdditionalInformation_2_2> <ProducedDocumentsWritingOff_2_03> <DocumentType xmlns="">1D24</DocumentType> <DocumentIdentifier xmlns="">202403211000</DocumentIdentifier> </ProducedDocumentsWritingOff_2_03> <ProducedDocumentsWritingOff_2_03> <DocumentType xmlns="">1D94</DocumentType> <DocumentIdentifier xmlns="">9214991</DocumentIdentifier> </ProducedDocumentsWritingOff_2_03> <ProducedDocumentsWritingOff_2_03>