Python Forum

Full Version: Not able to login and maintain session of LinkedIn using beautifulsoup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi All,

I am new to Python and trying the below code but not getting the desired output.

Library used: BeautifulSoup, Requests

Aim: To login into LinkedIn.
Fetch all the jobs
and write in output file.

Login in successful.
Now, i expect jobs page HTML to be fetched by python, just like all the jobs shown when i login from browser.
Instead job portal's login page is getting written in the output file.

I expected it to be logged in as I have maintained the session.

from bs4 import BeautifulSoup
import requests

session = requests.Session()

login_url = 'https://www.linkedin.com/uas/login-submit'
login_information = {
    'session_key':'[email protected]',
    'session_password':'12xxxxxx',
}
response = session.post(login_url,data=login_information)
if response.status_code != 200:
    raise Exception("Invalid response %s." % response)

job_page = session.get('https://www.linkedin.com/jobs/')
soup = BeautifulSoup(job_page.content,'html.parser')
html = soup.prettify()

with open("job.html", "w",encoding='utf-8') as file:
    file.write(str(html))
I do not think that login is working.
I have removed password details in your post,as i don't know if that's your a working password.

A quick look at form data that's send.
loginCsrfParam: 09b5482c-971d-407d-8cfa-3xxxxx
session_key: [email protected]
session_password: 1234567
trk: guest_homepage-basic_sign-in-submit
See that CSRF token need to be sent,this is addition security against CSRF attacks.
Can try to receive CSRF token before login,example untested.
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}

session = requests.Session()
login_url = 'https://www.linkedin.com/uas/login-submit'
csrf = session.get(login_url).cookies['csrftoken']
login_information = {
    'session_key':'[email protected]',
    'session_password':'12xxxxx',
    'loginCsrfParam': csrf,
    'trk': 'guest_homepage-basic_sign-in-submit',
}

response = session.post(login_url, headers=headers, params=login_information)
Often with sites like this it can be easier to use Selenium to login in with.
Can also send source code after login to eg BS for paring,using browser.page_source.
Thank you for the Help.

I will try and update all.
It is still giving me errors

from bs4 import BeautifulSoup
import requests
import csv

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
 
session = requests.Session()
login_url = 'https://www.linkedin.com/uas/login-submit'
csrf = session.get(login_url).cookies['csrftoken']
login_information = {
    'session_key':'[email protected]',
    'session_password':'pppppppppppp',
    'loginCsrfParam': csrf,
    'trk': 'guest_homepage-basic_sign-in-submit',
}
 
response = session.post(login_url, headers=headers, params=login_information)
if response.status_code != 200:
    raise Exception("Invalid response %s." % response)

job_page = session.get('https://www.linkedin.com/jobs/search?keywords=Data%20Science&location=United%20Kingdom&redirect=false&position=1&pageNum=0')
soup = BeautifulSoup(job_page.content,'html.parser')

with open("login_page.html", "w",encoding='utf-8') as file:
    file.write(str(soup.prettify()))
Error:
C:\Users\admin>login.py Traceback (most recent call last): File "C:\Users\admin\login.py", line 11, in <module> csrf = session.get(login_url).cookies['csrftoken'] File "C:\Users\admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\cookies.py", line 328, in __getitem__ return self._find_no_duplicates(name) File "C:\Users\admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\cookies.py", line 399, in _find_no_duplicates raise KeyError('name=%r, domain=%r, path=%r' % (name, domain, path)) KeyError: "name='csrftoken', domain=None, path=None"