Problem with logging in on website - python w/ requests - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Problem with logging in on website - python w/ requests (/thread-29846.html) |
Problem with logging in on website - python w/ requests - GoldeNx - Sep-22-2020 Hello, I'm learning how to write requests in python. I'm trying to log in to zalando-lounge.pl website. I keep recieving 403 error. I passed all the headers and it still doesn't work. That's my code: import requests import re from bs4 import BeautifulSoup session = requests.Session() login_data = { 'email':'[email protected]', 'password':'mypasswordexample', 'onlyLogin':'true' } mainPage = 'https://www.zalando-lounge.pl/#/login' loginPage = 'https://www.zalando-lounge.pl/onboarding-api/login' productPage = 'https://www.zalando-lounge.pl/campaigns/ZZO124A/categories/136197597/articles/ZZLNME013-Q00' main_page = session.get(mainPage) print(main_page.status_code) cookie = session.cookies.get_dict() cookie = re.sub("'", '', str(cookie)) cookie = re.sub(": ", "=", cookie) cookie = re.sub(",",";", cookie) cookie = re.sub("{", "", cookie) cookie = re.sub("}", "", cookie cookie = cookie+"; G_ENABLED_IDPS=google" print(cookie) headers = { 'authority': 'www.zalando-lounge.pl', 'path': '/onboarding-api/login', 'scheme': 'https', 'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7', 'content-length': '83', 'content-type': 'application/json', 'cookie': cookie, 'origin': 'https://www.zalando-lounge.pl', 'referer': 'https://www.zalando-lounge.pl/', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-origin', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36' } login_process = session.post(loginPage, data=login_data, headers=headers) print(login_process.status_code) print(session.cookies.get_dict()) product_page = session.get(productPage, headers=headers) source_code = product_page.content soup = BeautifulSoup(source_code, "html.parser") xd = soup.find_all("span") print(xd)That's how my random cookie string looks like: Zalando-Client-Id=67a554d4-6e24-4add-9537-2a0126c23aff; _abck=2707BF125938F7009191BAB069D39FFE~-1~YAAQNb17XMn+w7F0AQAAZ7WMsgQm+rbBeYpB3zcMq4l/hfZv58CeR4gKnKRf0hKQNVa7x7GNVrlhGoXrOiQ/bQ4CT9zfsp0MR0KLEZ9ZF56qhNzK7HBu8yfYbLkGB73COwUPpErpArOQtcSRNBLj06LBXBm6zhG+o4oAnQIiJstmMTNH8LXXZMxfXi+CjKQfVmYl/VF7JRyfz2x/f4ZrJ9NVRiH1Y9KN7sQo5wu9dJb4g/TsNIWARmfiKWKQP15xVXP8ymnedUTc2YhILmRmJdWyc+5QH7qMt1yzYpmY4iVz/Svm/MEoVxaSU4RUJocd1g==~-1~-1~-1; bm_sz=20B9BCDB198389EC02111957222E7E46~YAAQNb17XMj+w7F0AQAAZ7WMsgljGBLIRyR4lwG8R3FdEv3aJiK/dXikpp4MEw5B9j1UJBw7ZQ0fUnibdCbiSwIBXiOLKmKv9shq9uad3qm8WISRq+K1JqcS6KLzKEF8Wdwt48CaeE/kLaUnG0IVQXtfcr4pTuAWtgBpvidZfkRxOiV0gFTBopqPN9E2MgnNvotBGmx6Vg==; frsx=AAEAACY1shlbU3xPQ-ZOqAl1rdpCFsRcDGs0DShk4y0q--8KEeR4tsHG-b1Kedj8K6aRvugNyUZkeJnaho_NSI0iLlbCWWbr34AvyT-05JCY3v5oLJu39aTQO5RdmYSBZ0Lda7PgHUak9DeLqTWbP27iPxbBY3QQf3GD1vxfXfGE64d6Bcp2wqk1DnVsERFrWRF0SChCzt_6G0scIdsonOw=; zl_acquire_modal_version=v2; zl_webview_appversion=; zl_webview_ga_cid=; zl_webview_ga_tid=; zl_webviewos=; G_ENABLED_IDPS=googleAnd that's how the request headers look when I log in to the website using browser: https://i.stack.imgur.com/NxzfP.png Could you tell me what am I doing wrong? Thank you in advance RE: Problem with logging in on website - python w/ requests - snippsat - Sep-23-2020 In general i think you need to use Selenium for heavy JavaScript sites like this. Usually cookies send automatically in a session. Here something i have done before. import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36' } login_data = { 'email':'[email protected]', 'password':'mypasswordexample', 'onlyLogin':'true' } with requests.Session() as s: s.post('https://www.zalando-lounge.com/#/login', headers=headers, params=login_data) # logged in! session cookies saved for future requests # print(s.cookies) # Test response = s.get('Url inside') # cookies sent automatically! soup = BeautifulSoup(response.content, 'lxml') welcome = soup.find('something inside') print(welcome)So even the message that that combination username and password doesn't match,is done bye JavaScript in this case React. Selenium is the easiest way,or need to really look into how site work and maybe catch Json/Ajax response. RE: Problem with logging in on website - python w/ requests - GoldeNx - Sep-23-2020 (Sep-23-2020, 09:42 AM)snippsat Wrote: In general i think you need to use Selenium for heavy JavaScript sites like this. Selenium is too slow for me. That's why I try to write this with requests RE: Problem with logging in on website - python w/ requests - buran - Sep-23-2020 Was this asked on StackOverflow? I think recently I have seen it or similar one asked there RE: Problem with logging in on website - python w/ requests - GoldeNx - Sep-23-2020 (Sep-23-2020, 12:37 PM)buran Wrote: Was this asked on StackOverflow? I think recently I have seen it or similar one asked there Yes, it was but onone helped me. RE: Problem with logging in on website - python w/ requests - buran - Sep-23-2020 (Sep-23-2020, 04:08 PM)GoldeNx Wrote: Yes, it was but onone helped me. If you cross-post on different platforms you should post a link, so that everyone is aware. Otherwise people may spent time helping you while your problem was solved elsewhere. This question on StackOverflow RE: Problem with logging in on website - python w/ requests - snippsat - Sep-25-2020 (Sep-23-2020, 12:35 PM)GoldeNx Wrote: Selenium is too slow for me. That's why I try to write this with requestsCan run Selenium in headless(not loading browser) then is faster. When logged in can send source browser.page_source to BS for parse with lxml.This is way easier the trying to do this with Requests/BS on a JavaScript heavy site. Also when have .page_source can do serval operation at same speed.Example setup bid_size and price_sales or more will have good same speed after the setup and catch of .page_source is done.from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time, sys #--| Setup options = Options() options.add_argument("--headless") options.add_argument('--disable-gpu') #options.add_argument('--log-level=3') browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation browser.get('https://www.morningstar.com/stocks/XOSL/XXL/quote.html') #time.sleep(1) soup = BeautifulSoup(browser.page_source, 'lxml') bid_size = soup.select('div.dp-value.price-down.ng-binding.ng-scope') price_sales = soup.select('li:nth-child(9) > div > div.dp-value.ng-binding') print(price_sales[0].text.strip()) print(bid_size[0].text.strip())
|