Python Forum
Copy xml content from webpage and save to locally without special characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Copy xml content from webpage and save to locally without special characters
#11
I remowed your code as it containt login info.
Here is code without login info.
from xml.etree.ElementTree import fromstring
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from generic.Login import Login
from lxml import html, etree
from xml.etree import ElementTree as ET


options = Options()
options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)


class AIS_Import(Login):
    Login.login_func(driver, "https://www.thyme-it.net/aep2/sadlist/sadlist.html", "thymeit", "xxxx", "xxxx")

    '''Navigate to AIS Import template'''
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/a").click()
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/ul/li[2]/a").click()
    driver.find_element("xpath", "//a[text()= 'AIS Import']").click()

    '''Click on the first template(should be AUTO:..'''
    links = driver.find_elements("xpath", "//table//a")
    links[0].click()


    '''Select all checkboxes from dropdown'''
    ais_checkbox = driver.find_elements("xpath", "//input[@type='checkbox']")
    count_checkbox = 0

    for val in ais_checkbox:
        count_checkbox += 1
        try:
            val.click()
        except Exception as e:
            print(e)
    print(count_checkbox)


    '''Finish and send the declaration to Revenue'''
    driver.find_element("id", "finished").click()

    '''XML File comparison'''
    '''Get the XML'''
    driver.find_element("xpath", "//input[@value='Get XML']").click()

    response = requests.get(driver.current_url)
    print(response)
    soup = BeautifulSoup(response.content, 'lxml-xml')
    print(soup)
So what dos line 54 print,is it a url(to the raw .xml) eg same as open link under in browser that has .xml ending.
https://www.w3schools.com/xml/plant_catalog.xml
Reply
#12
Yes, that's correct. Line 54 is printing to see if it's hitting the right url, that is the raw xml getting generated.

What do you think can be done best to read through the entire contents of the raw url, and copy it to another file.

I'm thinking of few different options, but need your input with BeautifulSoup.

(Mar-22-2024, 07:16 PM)snippsat Wrote: I remowed your code as it containt login info.
Here is code without login info.
from xml.etree.ElementTree import fromstring
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from generic.Login import Login
from lxml import html, etree
from xml.etree import ElementTree as ET


options = Options()
options.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)


class AIS_Import(Login):
    Login.login_func(driver, "https://www.thyme-it.net/aep2/sadlist/sadlist.html", "thymeit", "xxxx", "xxxx")

    '''Navigate to AIS Import template'''
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/a").click()
    driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/ul/li[2]/a").click()
    driver.find_element("xpath", "//a[text()= 'AIS Import']").click()

    '''Click on the first template(should be AUTO:..'''
    links = driver.find_elements("xpath", "//table//a")
    links[0].click()


    '''Select all checkboxes from dropdown'''
    ais_checkbox = driver.find_elements("xpath", "//input[@type='checkbox']")
    count_checkbox = 0

    for val in ais_checkbox:
        count_checkbox += 1
        try:
            val.click()
        except Exception as e:
            print(e)
    print(count_checkbox)


    '''Finish and send the declaration to Revenue'''
    driver.find_element("id", "finished").click()

    '''XML File comparison'''
    '''Get the XML'''
    driver.find_element("xpath", "//input[@value='Get XML']").click()

    response = requests.get(driver.current_url)
    print(response)
    soup = BeautifulSoup(response.content, 'lxml-xml')
    print(soup)
So what dos line 54 print,is it a url(to the raw .xml) eg same as open link under in browser that has .xml ending.
https://www.w3schools.com/xml/plant_catalog.xml
Reply
#13
Do you get error if try to save url with .xml,work will get the test1.xml in folder you run code from.
response = requests.get(driver.current_url)
print(response)
with open('test1.xml', 'wb') as fp:
    fp.write(response.content)
Reply
#14
I did login to server to do a quick test.
The problem i get when try to download .xml with Request,
is that when Selenium has a session cookies that do not automatically goes over to Requests if try to download.
Then can do like this to transfer session cookie over to Requests.
session = requests.Session()
selenium_cookies = driver.get_cookies()
# Add each Selenium cookie to the Requests Session
for cookie in selenium_cookies:
    session.cookies.set(cookie['name'], cookie['value'])

print(driver.current_url) # This most be a url with .xml ending
response = session.get(driver.current_url)
with open('test1.xml', 'wb') as file:
    file.write(response.content)
Nik1811 and Pedroski55 like this post
Reply
#15
Thank you very much @snippsat for this solution.
You are absolutely correct. It's the session cookies that were not retained and hence was getting a blank response.
I'm going to use this solution now Smile

Meanwhile I did find an alternate, that might benefit someone:)

    '''click on 'SaveAs' in UI'''
     pyautogui.hotkey('ctrl', 's')
     time.sleep(2)
     pyautogui.press('enter')
     time.sleep(2)

    '''copy the contents of the latest file to the source code A_XML for comparison'''
     list_of_files = glob.iglob('C:/Users/Nikita/Downloads/*')  # * means all if need specific format then *.csv
     latest_file = max(list_of_files, key=os.path.getctime)
     print(latest_file)

     dest = open('A_XML.xml', 'w+')
     source = open('latest_file', "r")
     shutil.copyfileobj(source, dest)
(Mar-23-2024, 02:12 AM)snippsat Wrote: I did login to server to do a quick test.
The problem i get when try to download .xml with Request,
is that when Selenium has a session cookies that do not automatically goes over to Requests if try to download.
Then can do like this to transfer session cookie over to Requests.
session = requests.Session()
selenium_cookies = driver.get_cookies()
# Add each Selenium cookie to the Requests Session
for cookie in selenium_cookies:
    session.cookies.set(cookie['name'], cookie['value'])

print(driver.current_url) # This most be a url with .xml ending
response = session.get(driver.current_url)
with open('test1.xml', 'wb') as file:
    file.write(response.content)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Why is the copy method name in python list copy and not `__copy__`? YouHoGeon 2 284 Apr-04-2024, 01:18 AM
Last Post: YouHoGeon
  how to save to multiple locations during save cubangt 1 560 Oct-23-2023, 10:16 PM
Last Post: deanhystad
Question Special Characters read-write Prisonfeed 1 633 Sep-17-2023, 08:26 PM
Last Post: Gribouillis
  UPDATE SQLITE TABLE - Copy a fields content to another field. andrewarles 14 4,432 May-08-2021, 04:58 PM
Last Post: ibreeden
  Rename Multiple files in directory to remove special characters nyawadasi 9 6,428 Feb-16-2021, 09:49 PM
Last Post: BashBedlam
  copy content of text file with three delimiter into excel sheet vinaykumar 0 2,363 Jul-12-2020, 01:27 PM
Last Post: vinaykumar
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,774 May-15-2020, 01:37 PM
Last Post: snippsat
  Check for a special characters in a column and flag it ayomayam 0 2,057 Feb-12-2020, 03:04 PM
Last Post: ayomayam
  save content of table into file atlass218 10 9,978 Aug-28-2019, 12:12 PM
Last Post: Gribouillis
  Split pyscaffold project into packages locally mucrom 0 1,508 Aug-05-2019, 12:07 PM
Last Post: mucrom

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020