Can not download the PDF - Printable Version

Can not download the PDF - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Can not download the PDF (/thread-4627.html)

Pages: 1 2 3 4

RE: Can not download the PDF - thomas2004ch - Sep-02-2017

This help! The login page is coming out.

I replace the 'Foo' with my email address and 'Bar' with my lastname. It is logged in!

And here is what I got:
C:\ProgramData\Anaconda3>python.exe log_test.py
Welcome to the Technical Analysis of STOCKS & COMMODITIES Subscribers’ Area

Now I concat the code for downloading. But the PDF cannot be downloaded correctly as before. Here is my whole code:

"""
Spyder Editor

This is a temporary script file.
"""
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import urllib.request
 
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
browser = webdriver.Firefox(capabilities=caps)
web_url = 'http://technical.traders.com/sub/sublogin2.asp'
browser.get(web_url)

user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]')
user_name.send_keys("[email protected]")
password = browser.find_element_by_css_selector('#SubName > input[type="text"]')
password.send_keys("MyLastname")
time.sleep(2)
submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]')
submit.click()
time.sleep(2)
 
# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
log_in = soup.find('h2')
print(log_in.text)

time.sleep(2)

def download_file(download_url, fileName):
    print(download_url)
    response = urllib.request.urlopen(download_url)
    file = open(fileName, 'wb')
    print(response)
    file.write(response.read())
    file.close()
    response.close()
    print("Completed")

urlstr = "http://technical.traders.com/archive/article.asp?file=\V26\C07\131INTR.pdf"

download_file(urlstr, "D:\\eBooks\\Stocks_andCommodities\\2008\\Jul\\mypdf.pdf")

I think maybe we have to simulate the input of the PDF-Url and the clicking of download/save with Selenium?

RE: Can not download the PDF - snippsat - Sep-02-2017

(Sep-02-2017, 04:49 AM)thomas2004ch Wrote: I think maybe we have to simulate the input of the PDF-Url and the clicking of download/save with Selenium?

Yes,and then get new page source and try download as i did in previous post here.

browser.find_element_by_css_selector("a[href*='Foo']").click()
link = browser.page_source

I remember i had to use Firefox profile for this before.
Here an old post.

RE: Can not download the PDF - thomas2004ch - Sep-02-2017

Hi,

I am a little bit confused with your code. I do the following:

"""
Spyder Editor

This is a temporary script file.
"""
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import requests
 
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
browser = webdriver.Firefox(capabilities=caps)
web_url = 'http://technical.traders.com/sub/sublogin2.asp'
browser.get(web_url)

user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]')
user_name.send_keys("[email protected]")
password = browser.find_element_by_css_selector('#SubName > input[type="text"]')
password.send_keys("MyLastname")
time.sleep(2)
submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]')
submit.click()
time.sleep(2)
 
# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
log_in = soup.find('h2')
print(log_in.text)

time.sleep(2)

#--- Website
url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
browser.get(url)
browser.find_element_by_css_selector("a[href*='Foo']").click()
link = browser.page_source
time.sleep(5)

file_name = 'D:/eBooks/Stocks_andCommodities/2008/Jul/mypdf.pdf'
with open(file_name, "wb") as pdf:
    pdf.write(link)

I can see the PDF page is open. But I got error:

Error:  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)

NoSuchElementException: Unable to locate element: a[href*='Foo']

1.
It is my code correct?

2.
What can I replace 'Foo'?

RE: Can not download the PDF - snippsat - Sep-02-2017

(Sep-02-2017, 11:06 AM)thomas2004ch Wrote: 1.
It is my code correct?

Quote:2.
What can I replace 'Foo'?

I can not see your source code,so you have to find link eg bye CSS selector.
If use Chrome dev tool or FireFox dev tool,you get selector bye first right click inspect over link the in source right click Copy --> Selector.
The problem now is that you have to use Firefox profile method as i posted over,to download with Selenium.

To use method as i showed here with Requests to download,you probably have to get authorization header in form of a session cookie from Selenium.
Then pass those parameter into Requests,then try to download.
As mention this is more more difficult stuff and not so easy to grab if new to this.

RE: Can not download the PDF - thomas2004ch - Sep-03-2017

Many thanks, I will try it.

Another point: Maybe I can use the Selenium directly? Since I remeber the Selenium has a Recorder.