Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Can not download the PDF
#11
Quote:You most look at the contend and see if it's the same content as logged in browser.
I've checked the content, it is the same as the login page.

Quote:See if download link is in content or if you need to navigate more.
No, the download link is not in the content. Does this mean the login is not successful?

(Aug-31-2017, 12:19 PM)Larz60+ Wrote:
Quote:It seems it is better than before since the "print(p.content)" print much more content than before.
Please show what you are talking about.
It's difficult to visualize if you don't provide that.

The content is exact the same as in the login page:
http://technical.traders.com/sub/sublogin2.asp

Seems I can not upload a file here in the forum. I would like to upload it since the content is quite long. If the admin doesn't mind, I can copy here.

Besides I want to know if my code for download as below correct?

...
    # Download code
    with open(fileName, "wb") as pdf:
        pdf.write(r.content)
...
Reply
#12
Quote:The content is exact the same as in the login page:
http://technical.traders.com/sub/sublogin2.asp
You have changed my code,EMAIL is wrong it's called name in log in form.
# Fill in your details
payload = {
'EMAIL': '[email protected]',
'LASTNAME': 'MyLastname'
}
Change to:
payload = {
'name': '[email protected]',
'LASTNAME': 'MyLastname'
}
 
(Aug-31-2017, 02:06 PM)thomas2004ch Wrote: Besides I want to know if my code for download as below correct?
Yes,an open sample pdf test.
import requests

file_name = 'pdf_sample.pdf'
file_url = 'http://che.org.il/wp-content/uploads/2016/12/pdf-sample.pdf'
url_get = requests.get(file_url)
with open(file_name, "wb") as pdf:
    pdf.write(url_get.content)
Reply
#13
Sorry I forgot to tell you I did use your code 'name'. But since it doesn't work, so I changed it to 'EMAIL'. But now I change it back to 'name'. It doesn't help.

Here again my whole code:

import requests

login_url = (see outside code-block below)
file_url = (see outside code-block below)
fileName = (see outside code-block below)

# Fill in your details
payload = {
'name': '[email protected]',
'LASTNAME': 'MyLastname'
}
 
# Use "with" to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post(login_url, data=payload)
    # see if successful login
    print(p.content)
 
    # An authorized request
    r = s.get(file_url, stream=True)
    # Download code
    with open(fileName, "wb+") as pdf:
        pdf.write(r.content)
    
p.close()    


login_url = 'http://technical.traders.com/sub/sublogin2.asp'
file_url = 'http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf'
fileName = 'D:/eBooks/Stocks_andCommodities/2008/Jul/mypdf.pdf'

I guess the login still not successful yet?
Reply
#14
What response are you getting when you try to login?

I think it's important to make sure you understand that this isn't a python, or a requests, or a session issue.  The issue is that you're probably not supplying the right parameters to the right url.  An easy way to test out what a server needs/doesn't need, is to use a program such as Fiddler, and to try to copy what your browser is sending (very helpful for forms that use javascript to craft different requests than what you'd assume just from looking at the page).

For this particular url, for instance, if you open your browser's dev tools and look at the network tab (all three major browsers have a network tab), you can see what's passed when you try to login.  And you're missing half the parameters.

Try this:
payload = {
    "EMAIL": your_email,
    "LASTNAME": your_lastname,
    "FORMID": "STORE",
    "submitbtn": "Submit"
}
Reply
#15
I find something interessting. I delete all the cache first and close the browser. Then I open the browser again and copy the PDF url-link and try to call up the PDF file. This time I was linked to page for login. But this login page is different from that I use in my program. it is: http://technical.traders.com/archive/archivelogin.asp. By this page I am asked to enter the Suscriber_ID and my lastname. But I don't have the Subscriber_ID.

Normally I open another login page given in my program, it is: http://technical.traders.com/sub/sublogin2.asp. By this page I am asked to enter my email addess and my lastname. After login, I can open and download the PDF manually.

This means, I have to ask for the Subscriber_ID and use this, not the email address, in my program.
Reply
#16
I don't think it matters, neither of those are the page you should be submitting to.  If you open the network tab and watch what happens, or even just inspect the page to see what the form's action attribute is, you'll see that both pages submit to the exact same place: http://technical.traders.com/sub/sublog.asp
Reply
#17
(Aug-31-2017, 06:21 PM)nilamo Wrote: I don't think it matters, neither of those are the page you should be submitting to.  If you open the network tab and watch what happens, or even just inspect the page to see what the form's action attribute is, you'll see that both pages submit to the exact same place: http://technical.traders.com/sub/sublog.asp

Have you tried to open the http://technical.traders.com/sub/sublog.asp? It will lead you to an Log in Error page: http://technical.traders.com/sub/error.asp

I think this url should be the correct login page: http://technical.traders.com/sub/sublogin.asp

But as I run my program, I see nothing from the network page in dev tool. It seems the login page doesn't accept the login request from my python program?

And I change the payload as follow:
payload = {
    "ID": '123456',
    "LASTNAME": 'Mylastname',
    "FORMID": "STORE",
    "submitbtn": "Submit"
} 
But it still doesn't work.
Reply
#18
Quote:I think this url should be the correct login page: http://technical.traders.com/sub/sublogin.asp

That log in send this post request.
payload = {
    "ID": '123',
    "LASTNAME": 'Mylastname',
    "file": "",
    "submitbtn": "Submit"
} 
Quote:Normally I open another login page given in my program, it is: http://technical.traders.com/sub/sublogin2.asp
That log in send this post request as @nilamo posted.
payload = { 
    "EMAIL": "[email protected]",
    "LASTNAME": 'Mylastname',
    "FORMID": "STORE",
    "submitbtn": "Submit"
} 
Remove p.close(),the session has to be open for cookies(which is used to authorize) to work.
The point of using with open() is that  it close() automatically. 
stream=True remove.
Reply
#19
Thanks. But it still doesn't work.

The problem is, if I simply refresh/update the login page "http://technical.traders.com/sub/sublogin2.asp", I can see a lot of message in Network Tab under Dev Tool. But as I run the program, there is no any message shown. I am not if this has something to do with Java-Script on login page?

The item 'submitbtn' in the payload{...} simulates the click action on Submit, right?
Reply
#20
(Sep-01-2017, 04:09 AM)thomas2004ch Wrote: The item 'submitbtn' in the payload{...} simulates the click action on Submit, right?
Yes but there can by other stuff that that block,that's hard to test for us that not have a working log in to check.

There are other way of doing this,bye using Selenium and PhantomJS.
Example:
from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Activate Phantom and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Chrome()
web_url = 'http://technical.traders.com/sub/sublogin2.asp'
browser.get(web_url)
user_name = browser.find_element_by_css_selector('#SubID > input[type="text"]')
user_name.send_keys("Foo")
password = browser.find_element_by_css_selector('#SubName > input[type="text"]')
password.send_keys("Bar")
time.sleep(2)
submit = browser.find_element_by_css_selector('#SubButton > input[type="submit"]')
submit.click()
time.sleep(2)

# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
log_in = soup.find('h2')
print(log_in.text)
Output:
Log In Error
This will try to log me in,i send page source to BS and parse the log in error(a working log in would give the authorized  page).
If i had a working log this would probably work,because now it simulate a browser.
This is bit more advance,and you may not have seen this before.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020