Python Forum
Download a link that re-directs to a login page
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Download a link that re-directs to a login page
#1
Been struggling with this one for a while so hoping someone can give me a few ideas.

I've wrote a script to download links from an email. This works pretty well most of the time. The majority of the script is just parsing an email to harvest the links, then downloading using wget:

link = '(https://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)>'
pc = re.findall(link,searchtext)

for l in pc:
    wget.download (l,path)
So far, so good.

Recently, the website changed the location where the link points to, and it now requires authentication. An example link is here, it's a long link that redirects to a login page here.

If I run the script, it generates a ton of errors with this code at the end:

Error:
raise HTTPError(req.full_url, code, urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: Found
So I tried it in my browser and it now redirects to this page requesting authentication.

Inspecting the form shows a few fields called session[email] and session[password], and once you click login, it posts this info before redirecting to a landing page of sorts for the project.

I've tried to login first using requests.

import requests
s = requests.Session()
data = {"session[email]":"(email address here)", "session[password]":"(password here)"}
url = "https://login.procore.com/sessions"
r = s.post(url, data=data)
When I check r, I get response 200. So I load a second request to get but then get response 401.

import requests
s = requests.Session()
data = {"session[email]":"(email address here)", "session[password]":"(password here)"}
url = "https://login.procore.com/sessions"
r = s.post(url, data=data)
getfile="https://app.procore.com/783343/project/submittal_logs/document_downloader?attachment_id=2534930332&item_id=25772901&item_type=SubmittalLog&project_id=783343"
r1 = s.get(getfile)
That returns a 401 error. I also tried the wget method after signing in but still returns 302.

So I feel like I'm either over-complicating it or I've driven past the point and totally missed something so obvious I will bang my head against the desk for a half hour.

So if anyone has any advice on this, would be greatly appreciated. And if you've gotten this far, thanks for reading through this novel!
Reply
#2
Did you know you could use cookies with requests when sending headers in the request instance, this is what i do when i'm dealing with mass scraping with sites that require captcha to users that aren't logged in.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Login and download an exported csv file within a ribbon/button in a website Alekhya 0 2,616 Feb-26-2021, 04:15 PM
Last Post: Alekhya
  [Flask]After login page is not redirecting me to dashboard shockwave 0 2,659 May-07-2020, 05:22 PM
Last Post: shockwave
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,579 Mar-19-2020, 06:13 PM
Last Post: apollo
  get link and link text from table metulburr 5 6,188 Jun-13-2019, 07:50 PM
Last Post: snippsat
  Django-cms link to a page Alkatron 4 8,220 Apr-06-2018, 10:58 AM
Last Post: Alkatron
  Home / Login & Logout Page ab_1986 1 3,155 Dec-20-2017, 06:26 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020