Download a link that re-directs to a login page

justanotherpythonnoob · (This post was last modified: Oct-22-2020, 01:44 PM by justanotherpythonnoob.)

Been struggling with this one for a while so hoping someone can give me a few ideas.

I've wrote a script to download links from an email. This works pretty well most of the time. The majority of the script is just parsing an email to harvest the links, then downloading using wget:

link = '(https://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)>'
pc = re.findall(link,searchtext)

for l in pc:
    wget.download (l,path)

So far, so good.

Recently, the website changed the location where the link points to, and it now requires authentication. An example link is here, it's a long link that redirects to a login page here.

If I run the script, it generates a ton of errors with this code at the end:

Error:raise HTTPError(req.full_url, code,
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found

So I tried it in my browser and it now redirects to this page requesting authentication.

Inspecting the form shows a few fields called session[email] and session[password], and once you click login, it posts this info before redirecting to a landing page of sorts for the project.

I've tried to login first using requests.

import requests
s = requests.Session()
data = {"session[email]":"(email address here)", "session[password]":"(password here)"}
url = "https://login.procore.com/sessions"
r = s.post(url, data=data)

When I check r, I get response 200. So I load a second request to get but then get response 401.

import requests
s = requests.Session()
data = {"session[email]":"(email address here)", "session[password]":"(password here)"}
url = "https://login.procore.com/sessions"
r = s.post(url, data=data)
getfile="https://app.procore.com/783343/project/submittal_logs/document_downloader?attachment_id=2534930332&item_id=25772901&item_type=SubmittalLog&project_id=783343"
r1 = s.get(getfile)

That returns a 401 error. I also tried the wget method after signing in but still returns 302.

So I feel like I'm either over-complicating it or I've driven past the point and totally missed something so obvious I will bang my head against the desk for a half hour.

So if anyone has any advice on this, would be greatly appreciated. And if you've gotten this far, thanks for reading through this novel!

Aspire2Inspire · Oct-23-2020, 03:27 PM

Did you know you could use cookies with requests when sending headers in the request instance, this is what i do when i'm dealing with mass scraping with sites that require captcha to users that aren't logged in.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Login and download an exported csv file within a ribbon/button in a website	Alekhya	0	3,522	Feb-26-2021, 04:15 PM Last Post: Alekhya
	[Flask]After login page is not redirecting me to dashboard	shockwave	0	3,749	May-07-2020, 05:22 PM Last Post: shockwave
	use Xpath in Python :: libxml2 for a page-to-page skip-setting	apollo	2	4,629	Mar-19-2020, 06:13 PM Last Post: apollo
	get link and link text from table	metulburr	5	8,172	Jun-13-2019, 07:50 PM Last Post: snippsat
	Django-cms link to a page	Alkatron	4	9,474	Apr-06-2018, 10:58 AM Last Post: Alkatron
	Home / Login & Logout Page	ab_1986	1	3,801	Dec-20-2017, 06:26 AM Last Post: Larz60+

Download a link that re-directs to a login page

User Panel Messages

Announcements