Python Forum
Copy xml content from webpage and save to locally without special characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Copy xml content from webpage and save to locally without special characters
#12
Yes, that's correct. Line 54 is printing to see if it's hitting the right url, that is the raw xml getting generated.

What do you think can be done best to read through the entire contents of the raw url, and copy it to another file.

I'm thinking of few different options, but need your input with BeautifulSoup.

(Mar-22-2024, 07:16 PM)snippsat Wrote: I remowed your code as it containt login info.
Here is code without login info.
from xml.etree.ElementTree import fromstring
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from generic.Login import Login
from lxml import html, etree
from xml.etree import ElementTree as ET


options = Options()
options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)


class AIS_Import(Login):
    Login.login_func(driver, "https://www.thyme-it.net/aep2/sadlist/sadlist.html", "thymeit", "xxxx", "xxxx")

    '''Navigate to AIS Import template'''
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/a").click()
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/ul/li[2]/a").click()
    driver.find_element("xpath", "//a[text()= 'AIS Import']").click()

    '''Click on the first template(should be AUTO:..'''
    links = driver.find_elements("xpath", "//table//a")
    links[0].click()


    '''Select all checkboxes from dropdown'''
    ais_checkbox = driver.find_elements("xpath", "//input[@type='checkbox']")
    count_checkbox = 0

    for val in ais_checkbox:
        count_checkbox += 1
        try:
            val.click()
        except Exception as e:
            print(e)
    print(count_checkbox)


    '''Finish and send the declaration to Revenue'''
    driver.find_element("id", "finished").click()

    '''XML File comparison'''
    '''Get the XML'''
    driver.find_element("xpath", "//input[@value='Get XML']").click()

    response = requests.get(driver.current_url)
    print(response)
    soup = BeautifulSoup(response.content, 'lxml-xml')
    print(soup)
So what dos line 54 print,is it a url(to the raw .xml) eg same as open link under in browser that has .xml ending.
https://www.w3schools.com/xml/plant_catalog.xml
Reply


Messages In This Thread
RE: Copy xml content from webpage and save to locally without special characters - by Nik1811 - Mar-22-2024, 07:33 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Why is the copy method name in python list copy and not `__copy__`? YouHoGeon 2 365 Apr-04-2024, 01:18 AM
Last Post: YouHoGeon
  how to save to multiple locations during save cubangt 1 628 Oct-23-2023, 10:16 PM
Last Post: deanhystad
Question Special Characters read-write Prisonfeed 1 698 Sep-17-2023, 08:26 PM
Last Post: Gribouillis
  UPDATE SQLITE TABLE - Copy a fields content to another field. andrewarles 14 4,622 May-08-2021, 04:58 PM
Last Post: ibreeden
  Rename Multiple files in directory to remove special characters nyawadasi 9 6,690 Feb-16-2021, 09:49 PM
Last Post: BashBedlam
  copy content of text file with three delimiter into excel sheet vinaykumar 0 2,422 Jul-12-2020, 01:27 PM
Last Post: vinaykumar
  Remove escape characters / Unicode characters from string DreamingInsanity 5 14,154 May-15-2020, 01:37 PM
Last Post: snippsat
  Check for a special characters in a column and flag it ayomayam 0 2,103 Feb-12-2020, 03:04 PM
Last Post: ayomayam
  save content of table into file atlass218 10 10,189 Aug-28-2019, 12:12 PM
Last Post: Gribouillis
  Split pyscaffold project into packages locally mucrom 0 1,549 Aug-05-2019, 12:07 PM
Last Post: mucrom

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020