Can not download the PDF - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Can not download the PDF (/thread-4627.html) |
RE: Can not download the PDF - thomas2004ch - Sep-02-2017 This help! The login page is coming out. I replace the 'Foo' with my email address and 'Bar' with my lastname. It is logged in! And here is what I got: C:\ProgramData\Anaconda3>python.exe log_test.py Welcome to the Technical Analysis of STOCKS & COMMODITIES Subscribers’ Area Now I concat the code for downloading. But the PDF cannot be downloaded correctly as before. Here is my whole code: """ Spyder Editor This is a temporary script file. """ from selenium import webdriver from bs4 import BeautifulSoup import time import urllib.request caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True browser = webdriver.Firefox(capabilities=caps) web_url = 'http://technical.traders.com/sub/sublogin2.asp' browser.get(web_url) user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]') user_name.send_keys("[email protected]") password = browser.find_element_by_css_selector('#SubName > input[type="text"]') password.send_keys("MyLastname") time.sleep(2) submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]') submit.click() time.sleep(2) # Give source code to BeautifulSoup soup = BeautifulSoup(browser.page_source, 'lxml') log_in = soup.find('h2') print(log_in.text) time.sleep(2) def download_file(download_url, fileName): print(download_url) response = urllib.request.urlopen(download_url) file = open(fileName, 'wb') print(response) file.write(response.read()) file.close() response.close() print("Completed") urlstr = "http://technical.traders.com/archive/article.asp?file=\V26\C07\131INTR.pdf" download_file(urlstr, "D:\\eBooks\\Stocks_andCommodities\\2008\\Jul\\mypdf.pdf") I think maybe we have to simulate the input of the PDF-Url and the clicking of download/save with Selenium? RE: Can not download the PDF - snippsat - Sep-02-2017 (Sep-02-2017, 04:49 AM)thomas2004ch Wrote: I think maybe we have to simulate the input of the PDF-Url and the clicking of download/save with Selenium?Yes,and then get new page source and try download as i did in previous post here. browser.find_element_by_css_selector("a[href*='Foo']").click() link = browser.page_sourceI remember i had to use Firefox profile for this before. Here an old post. RE: Can not download the PDF - thomas2004ch - Sep-02-2017 Hi, I am a little bit confused with your code. I do the following: """ Spyder Editor This is a temporary script file. """ from selenium import webdriver from bs4 import BeautifulSoup import time import requests caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True browser = webdriver.Firefox(capabilities=caps) web_url = 'http://technical.traders.com/sub/sublogin2.asp' browser.get(web_url) user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]') user_name.send_keys("[email protected]") password = browser.find_element_by_css_selector('#SubName > input[type="text"]') password.send_keys("MyLastname") time.sleep(2) submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]') submit.click() time.sleep(2) # Give source code to BeautifulSoup soup = BeautifulSoup(browser.page_source, 'lxml') log_in = soup.find('h2') print(log_in.text) time.sleep(2) #--- Website url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf' browser.get(url) browser.find_element_by_css_selector("a[href*='Foo']").click() link = browser.page_source time.sleep(5) file_name = 'D:/eBooks/Stocks_andCommodities/2008/Jul/mypdf.pdf' with open(file_name, "wb") as pdf: pdf.write(link)I can see the PDF page is open. But I got error: 1.It is my code correct? 2. What can I replace 'Foo'? RE: Can not download the PDF - snippsat - Sep-02-2017 (Sep-02-2017, 11:06 AM)thomas2004ch Wrote: 1.No Quote:2.I can not see your source code,so you have to find link eg bye CSS selector. If use Chrome dev tool or FireFox dev tool,you get selector bye first right click inspect over link the in source right click Copy --> Selector .The problem now is that you have to use Firefox profile method as i posted over,to download with Selenium. To use method as i showed here with Requests to download,you probably have to get authorization header in form of a session cookie from Selenium. Then pass those parameter into Requests,then try to download. As mention this is more more difficult stuff and not so easy to grab if new to this. RE: Can not download the PDF - thomas2004ch - Sep-03-2017 Many thanks, I will try it. Another point: Maybe I can use the Selenium directly? Since I remeber the Selenium has a Recorder. |