Copy xml content from webpage and save to locally without special characters

***snippsat*** · (This post was last modified: Mar-22-2024, 07:16 PM by snippsat.)

I remowed your code as it containt login info.
Here is code without login info.

from xml.etree.ElementTree import fromstring
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from generic.Login import Login
from lxml import html, etree
from xml.etree import ElementTree as ET


options = Options()
options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)


class AIS_Import(Login):
    Login.login_func(driver, "https://www.thyme-it.net/aep2/sadlist/sadlist.html", "thymeit", "xxxx", "xxxx")

    '''Navigate to AIS Import template'''
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/a").click()
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/ul/li[2]/a").click()
    driver.find_element("xpath", "//a[text()= 'AIS Import']").click()

    '''Click on the first template(should be AUTO:..'''
    links = driver.find_elements("xpath", "//table//a")
    links[0].click()


    '''Select all checkboxes from dropdown'''
    ais_checkbox = driver.find_elements("xpath", "//input[@type='checkbox']")
    count_checkbox = 0

    for val in ais_checkbox:
        count_checkbox += 1
        try:
            val.click()
        except Exception as e:
            print(e)
    print(count_checkbox)


    '''Finish and send the declaration to Revenue'''
    driver.find_element("id", "finished").click()

    '''XML File comparison'''
    '''Get the XML'''
    driver.find_element("xpath", "//input[@value='Get XML']").click()

    response = requests.get(driver.current_url)
    print(response)
    soup = BeautifulSoup(response.content, 'lxml-xml')
    print(soup)

So what dos line 54 print,is it a url(to the raw .xml) eg same as open link under in browser that has .xml ending.

https://www.w3schools.com/xml/plant_catalog.xml

Nik1811 · (This post was last modified: Mar-22-2024, 07:34 PM by Nik1811.)

Yes, that's correct. Line 54 is printing to see if it's hitting the right url, that is the raw xml getting generated.

What do you think can be done best to read through the entire contents of the raw url, and copy it to another file.

I'm thinking of few different options, but need your input with BeautifulSoup.

(Mar-22-2024, 07:16 PM)snippsat Wrote: I remowed your code as it containt login info.
Here is code without login info.

from xml.etree.ElementTree import fromstring
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from generic.Login import Login
from lxml import html, etree
from xml.etree import ElementTree as ET


options = Options()
options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)


class AIS_Import(Login):
    Login.login_func(driver, "https://www.thyme-it.net/aep2/sadlist/sadlist.html", "thymeit", "xxxx", "xxxx")

    '''Navigate to AIS Import template'''
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/a").click()
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/ul/li[2]/a").click()
    driver.find_element("xpath", "//a[text()= 'AIS Import']").click()

    '''Click on the first template(should be AUTO:..'''
    links = driver.find_elements("xpath", "//table//a")
    links[0].click()


    '''Select all checkboxes from dropdown'''
    ais_checkbox = driver.find_elements("xpath", "//input[@type='checkbox']")
    count_checkbox = 0

    for val in ais_checkbox:
        count_checkbox += 1
        try:
            val.click()
        except Exception as e:
            print(e)
    print(count_checkbox)


    '''Finish and send the declaration to Revenue'''
    driver.find_element("id", "finished").click()

    '''XML File comparison'''
    '''Get the XML'''
    driver.find_element("xpath", "//input[@value='Get XML']").click()

    response = requests.get(driver.current_url)
    print(response)
    soup = BeautifulSoup(response.content, 'lxml-xml')
    print(soup)

So what dos line 54 print,is it a url(to the raw .xml) eg same as open link under in browser that has .xml ending.

https://www.w3schools.com/xml/plant_catalog.xml

***snippsat*** · Mar-22-2024, 08:13 PM

Do you get error if try to save url with .xml,work will get the test1.xml in folder you run code from.

response = requests.get(driver.current_url)
print(response)
with open('test1.xml', 'wb') as fp:
    fp.write(response.content)

***snippsat*** · Mar-23-2024, 02:12 AM

I did login to server to do a quick test.
The problem i get when try to download .xml with Request,
is that when Selenium has a session cookies that do not automatically goes over to Requests if try to download.
Then can do like this to transfer session cookie over to Requests.

session = requests.Session()
selenium_cookies = driver.get_cookies()
# Add each Selenium cookie to the Requests Session
for cookie in selenium_cookies:
    session.cookies.set(cookie['name'], cookie['value'])

print(driver.current_url) # This most be a url with .xml ending
response = session.get(driver.current_url)
with open('test1.xml', 'wb') as file:
    file.write(response.content)

Nik1811 · (This post was last modified: Apr-10-2024, 01:35 PM by Nik1811.)

Thank you very much @snippsat for this solution.
You are absolutely correct. It's the session cookies that were not retained and hence was getting a blank response.
I'm going to use this solution now Smile

Meanwhile I did find an alternate, that might benefit someone:)

    '''click on 'SaveAs' in UI'''
     pyautogui.hotkey('ctrl', 's')
     time.sleep(2)
     pyautogui.press('enter')
     time.sleep(2)

    '''copy the contents of the latest file to the source code A_XML for comparison'''
     list_of_files = glob.iglob('C:/Users/Nikita/Downloads/*')  # * means all if need specific format then *.csv
     latest_file = max(list_of_files, key=os.path.getctime)
     print(latest_file)

     dest = open('A_XML.xml', 'w+')
     source = open('latest_file', "r")
     shutil.copyfileobj(source, dest)

(Mar-23-2024, 02:12 AM)snippsat Wrote: I did login to server to do a quick test.
The problem i get when try to download .xml with Request,
is that when Selenium has a session cookies that do not automatically goes over to Requests if try to download.
Then can do like this to transfer session cookie over to Requests.
session = requests.Session()
selenium_cookies = driver.get_cookies()
# Add each Selenium cookie to the Requests Session
for cookie in selenium_cookies:
    session.cookies.set(cookie['name'], cookie['value'])

print(driver.current_url) # This most be a url with .xml ending
response = session.get(driver.current_url)
with open('test1.xml', 'wb') as file:
    file.write(response.content)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Locally run an APK and execute functions using Python	KovyJ	0	410	Jan-23-2025, 05:21 PM Last Post: KovyJ
	[SOLVED] Special characters in XML	ForeverNoob	3	1,631	Dec-04-2024, 01:26 PM Last Post: ForeverNoob
	Why is the copy method name in python list copy and not `__copy__`?	YouHoGeon	2	1,283	Apr-04-2024, 01:18 AM Last Post: YouHoGeon
	how to save to multiple locations during save	cubangt	1	1,250	Oct-23-2023, 10:16 PM Last Post: deanhystad
	Special Characters read-write	Prisonfeed	1	1,375	Sep-17-2023, 08:26 PM Last Post: Gribouillis
	UPDATE SQLITE TABLE - Copy a fields content to another field.	andrewarles	14	6,326	May-08-2021, 04:58 PM Last Post: ibreeden
	Rename Multiple files in directory to remove special characters	nyawadasi	9	10,010	Feb-16-2021, 09:49 PM Last Post: BashBedlam
	copy content of text file with three delimiter into excel sheet	vinaykumar	0	2,876	Jul-12-2020, 01:27 PM Last Post: vinaykumar
	Remove escape characters / Unicode characters from string	DreamingInsanity	5	20,752	May-15-2020, 01:37 PM Last Post: snippsat
	Check for a special characters in a column and flag it	ayomayam	0	2,565	Feb-12-2020, 03:04 PM Last Post: ayomayam

Copy xml content from webpage and save to locally without special characters

User Panel Messages

Announcements