Logging into a website with requests

HiImNew · Jul-04-2018, 04:59 AM

I'm trying to log into globenewswire.com with requests on my reader account. After I log in I want to go to a page on my account that requires my login to access. I'm using this code to basically parse for news.
This is the url where im trying to log in from :https://login.globenewswire.com/?ReturnUrl=%2fReaderAccount%3frunSearchId%3d41894572&runSearchId=41894572#login
Here it is so far:

import requests
import bs4 as bs
url = 'https://login.globenewswire.com/?ReturnUrl=%2fReaderAccount%3frunSearchId%3d41894572&runSearchId=41894572#login'
USER = 'myusernamehere'
PASS = 'mypasswordhere'
user_pass = {
    'emailAddress': USER,
    'password': PASS,
}
session = requests.Session()
session.post(url, data=user_pass)
r = session.get('https://globenewswire.com/Search?runSearchId=41894572')  # The page on my account which requires login to access.
soup = bs.BeautifulSoup(r.content, 'lxml')
title_list = soup.find_all('p', class_="company-title")  # Parsing for news titles.
print(r.text)  # Prints out source code just to see it.
print('len(title_list) =', len(title_list))  # len(title_list) should be 10 if I run my code right, but it gives me 0.

I understand that for every website logging in with requests is slightly different. I was able to get the names of the fields to post my username and password in, However, I don't know where to go from here. Do I have to send cookies? Am I posting wrong? P.S. My code is not explicitly giving an error, it just doesn't take me to the page I want to go. Any help is appreciated.

***snippsat*** · (This post was last modified: Jul-04-2018, 12:44 PM by snippsat.)

(Jul-04-2018, 04:59 AM)HiImNew Wrote: Do I have to send cookies?

You may have to send both header and cookie,it depend on the login.
I can not look at login because i get to Security Questions for not registered email.
You can look some post where i help some before.
With Requests link.
With Selenium link.

HiImNew · (This post was last modified: Jul-05-2018, 10:14 AM by HiImNew.)

Edit: Wrong thread, mb.

I tried logging in manually and then copying over my requests headers like this:

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Cookie': 'dragonticket=FB11B8820BF33AA9808BA9293666G456G45G634FFE26615EFB7772D65058250A88E871E7D8FAABF29988EF8ED91FB4E084E8F6F80EC3FFE809F00FBFF23E5A53BD328A703F32CDRTHDRTD564C3C77216F9EE6129EDFB0F37886D4G56G456G8A719536F600F69EB059BEG56445G645G6456G45GTYFMGYILF04556CE5H45645DG56G45A82F1B419C07EFA161D39B19A8B5C7D189G456G4566C7B42; dragonAuthInd=dragonAuthInd; ASP.NET_SessionId=bgmaerkwqv2ub0fqf2fgzoz2; __pnrculture=en-US; GNWTracker=5a73b457-63e8-48c3-92da-364fbdb3c74e; __RequestVerificationToken_Lw__=24Z/UDfY86IwZCtnh0DdrPYcDP1dPzoqAuW3S0+MajTitg5Y/phCNDtHc9kT74aL6Vs5NnCq2Vfk9JbJmYoLq/qSnQz7X6YPGsjG6howdWcynQjuLZ34/4yFEd8TOAVm8n0VgA==; NSC_W.HDT-QOS.443=ffffffff09291e1b45525d5f4f58455e445a4a42378b; __utma=202784462.1871429885.1530780973.1530780973.1530780973.1; __utmc=202784462; __utmz=202784462.1530780973.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmb=202784462.1.10.1530780973',
    'Host': 'globenewswire.com',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
requestsession = requests.Session()
r2 = requestsession.get('https://globenewswire.com/Search?runSearchId=41894572', headers=headers)
soup = bs.BeautifulSoup(r2.content, 'lxml')
title_list = soup.find_all('p', class_="company-title")
print('len(title_list) =', len(title_list))
for i in title_list:
    print(i)

If the code runs the way it's supposed to, len(title_list) should equal 10, and surprisingly, it does. Thank you for the post you shared snippsat. However, when I log out manually, this code no longer works. I think these cookies are generated for each session, and I do not want to generate them manually and copy and paste them into my code each time I generate them. So far, I can manually grab every cookie except for the dragon ticket, because that requires a few redirects. However the dragon ticket seems to be necessary, as my code doesn't run without it. The redirects go like this:

https://login.globenewswire.com/Auth/Che...rArcotUser
https://login.globenewswire.com/Auth/LoginNew # This is where you receive your dragon ticket.
http://globenewswire.com/?culture=en-US
https://globenewswire.com/?culture=en-US # This is the homepage you are taken to, and you must submit your dragon ticket to get here.

How would I tell requests to snag my dragon ticket, which is located in response headers. I already know how to grab response headers, but I cannot do it during multiple redirects. (I also do not know if requests is actually preforming these redirects). Is there a way for requests to grab cookies during redirects? Once you have your dragonticket, you are basically good to go. Right now, this is what I have:

import requests
import bs4 as bs
login_headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    'Host': 'login.globenewswire.com',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
requestsession = requests.Session()
url = 'https://login.globenewswire.com/#login'
login_page = requestsession.get(url, headers=login_headers)
soup = bs.BeautifulSoup(login_page.content, 'lxml')
VALUE = soup.find('input', {'name': '__RequestVerificationToken'}).get('value')  # This gets my RequestVerificationToken. It's needed to  login.
login_data = {
    'emailAddress': MY_USER_NAME,
    'password': MY_PASS
}
cookie = {
    '__RequestVerificationToken': VALUE,
}
post_to_url = 'https://login.globenewswire.com/Auth/Login?ReturnUrl=%2FSecurity%2FLogin%3Fculture%3Den-US'
x = requestsession.post(post_to_url, headers=login_headers, cookies=cookie, params=login_data, stream=True)
cookie2 = requestsession.cookies.get_dict()
print(cookie2)

By then the code should have logged in and received its dragon ticket in a response header, but when I print cookie2, I don't see the dragon ticket.
What am I doing wrong?

HiImNew · Jul-06-2018, 06:00 AM

I found an easier way to do it. For anyone interested, they have an RSS feed that generates a unique url for each account and does not require a login. You can navigate your saved searches to find the your own custom RSS feed url. However, I do not think that url should be shared, as it is linked to your account only.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	POST requests - different requests return the same response	Default_001	3	1,952	Mar-10-2022, 11:26 PM Last Post: Default_001
	Problem with logging in on website - python w/ requests	GoldeNx	6	5,344	Sep-25-2020, 10:52 AM Last Post: snippsat
	Using python requests module and BS4 to login on an Wordpress based website	apollo	1	9,430	Feb-06-2018, 01:31 AM Last Post: metulburr

Logging into a website with requests

User Panel Messages

Announcements