Can not download the PDF

thomas2004ch · (This post was last modified: Aug-30-2017, 08:16 PM by thomas2004ch.)

Hi all,

I try to download a PDF file. The following is my program. By running there is no error. But as I open the downloaded PDF file, it jus contains some HTTL Response code, not the content of the PDF file. And I get the error message: Error by loading the PDF file.

Has someone idea?

[font=Courier New, Courier, monospace]
import requests
file_url = "XXX"

 
fileName = 'D:/eBooks/mypdf.pdf'
r = requests.get(file_url, stream = True)
 
with open(fileName, "wb") as pdf:
    for chunk in r.iter_content(chunk_size=1024):
 
         # writing one chunk at a time to pdf file
         if chunk:
             pdf.write(chunk)
             
pdf.close()             
[/font]

Since I can't set the url link in code sector, I write down here:

file_url = "http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf"

But I can open and download the PDF manually. Why?

**Larz60+** · (This post was last modified: Aug-30-2017, 08:45 PM by Larz60+.)

One thing I see right off, remove the pdf.close() statement, Not needed when using with...
The URL that you show requires a login, it won't get the pdf without.
I am also assuming that the font formatting is not part of your code.
Do you get an error traceback? If so, please post the entire traceback verbatim.

***snippsat*** · Aug-30-2017, 08:44 PM

You have to be logged in to download that pdf.
So you need to write code to log in fist,then download.

thomas2004ch · (This post was last modified: Aug-31-2017, 04:30 AM by thomas2004ch.)

(Aug-30-2017, 08:44 PM)Larz60+ Wrote: One thing I see right off, remove the pdf.close() statement, Not needed when using with...

I've tried remove the pdf.close(). It is the same.

Quote:The URL that you show requires a login, it won't get the pdf without.

Yes. But I can open it and download it after loging manually.

You mean I have to set the login in the code? How?

Quote:I am also assuming that the font formatting is not part of your code.

If you see any font formatting, it must not part of my code.

Quote:Do you get an error traceback? If so, please post the entire traceback verbatim.

How to do the error trcaeback? I use the Spider from Anaconda3.

2.
Yes

(Aug-30-2017, 08:44 PM)snippsat Wrote: So you need to write code to log in fist,then download.

How to write the code for login?

The login to the page is not Username and Password, but Email-address and Lastname.

thomas2004ch · Aug-31-2017, 06:40 AM

Further info:

This PDF is from an Online Magazin and I've subscribed. After longin with email address and my last name, I can open and download the PDF file manually. Since I don't want to do this manually, I wrote this small program to do that. By running my programm there is no any error and there is any request for asking for login data. And indeed the mypdf.pdf is created but the content are not valid PDF but look as follow:
...

<!DOCTYPE html>
<head>

<script language="JavaScript">
...

**Larz60+** · Aug-31-2017, 07:10 AM

Quote:I've tried remove the pdf.close(). It is the same.

Remove it, it doesn't belong there.

Quote:You mean I have to set the login in the code? How?

There are several ways to do this, see: https://stackoverflow.com/questions/2910...ith-python

Quote:I am also assuming that the font formatting is not part of your code.

I didn't think so.

Quote:How to do the error trcaeback? I use the Spider from Anaconda3.

A traceback will begin with: 'Traceback (most recent call last):'
If you have one, post complete, and use error tags (icon circle with X)

***snippsat*** · (This post was last modified: Aug-31-2017, 09:50 AM by snippsat.)

(Aug-31-2017, 06:40 AM)thomas2004ch Wrote: And indeed the mypdf.pdf is created but the content are not valid PDF but look as follow:

You are downloading the source of website when not logged in.
If you look further down the source.

<h2 class="grad">Register or Log In &mdash; Traders.com and STOCKS &amp; COMMODITIES magazine</h2>

When you use a browser,it automatically log you in.
Log out of the website or clear browser cache,then try your our own link you see that you do not get the PDF the link address.

thomas2004ch Wrote:How to write the code for login?

Requests Authentication.
You may need a session.
Here a example and i have been looking at source of login.
You login with Your Subscriber ID: and Your Last Name OR Company:

import requests

# Fill in your details
payload = {
    'name': 'your_id',
    'LASTNAME': 'your lastname or company'
}

# Use "with" to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post('LOGIN_URL', data=payload)
    # see if successful login
    print(p.content)

    # An authorized request
    r = s.get('Your Pdf link',stream=True)
    # Download code

thomas2004ch · Aug-31-2017, 12:12 PM

Hi,
Thanks for the reply.

It seems it is better than before since the "print(p.content)" print much more content than before. And I haven't got any error message. Does this mean the login is successful?

But as I try to download it, it seems it failt. Here is my code:

import requests

login_url = XXXX
file_url = XXXX
fileName = XXXX

# Fill in your details
payload = {
    'EMAIL': '[email protected]',
    'LASTNAME': 'MyLastname'
}
 
# Use "with" to ensure the session context is closed after use.
with requests.Session() as s:
    p = s.post(login_url, data=payload)
    # see if successful login
    print(p.content)
 
    # An authorized request
    r = s.get(file_url, stream=True)
    # Download code
    with open(fileName, "wb") as pdf:
        pdf.write(r.content)
    
p.close()

login_url = "http://technical.traders.com/sub/sublogin2.asp"
file_url = "http://technical.traders.com/archive/articlefinal.asp?file=\V26\C07\\131INTR.pdf"
fileName = 'D:/eBooks/Stocks_andCommodities/2008/Jul/mypdf.pdf'

**Larz60+** · (This post was last modified: Aug-31-2017, 12:20 PM by Larz60+.)

Quote:It seems it is better than before since the "print(p.content)" print much more content than before.

Please show what you are talking about.
It's difficult to visualize if you don't provide that.

***snippsat*** · Aug-31-2017, 01:25 PM

(Aug-31-2017, 12:12 PM)thomas2004ch Wrote: It seems it is better than before since the "print(p.content)" print much more content than before. And I haven't got any error message. Does this mean the login is successful?

You most look at the contend and see if it's the same content as logged in browser.
See if download link is in content or if you need to navigate more.

Learn to use Chrome DevTools or Firefox Developer Tools to look a content of a website.

Can not download the PDF

User Panel Messages

Announcements