Login and access website

mariolopes · Feb-05-2018, 10:54 AM

Hi
I want to login on this website

https://www.cpcdi.pt/Account/Login
I have the credentials and I try the following code:

import requests
import sys
import urllib.request
import re
URL = 'https://www.cpcdi.pt/Account/Login'
def main():
    # Start a session so we can have persistant cookies
    #session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'CodCliente': 'mycode',
        'UserName': 'myuser',
        'Password': 'mypass',
        'submit':'submit',
    }

    # Authenticate
    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'
    r = session.get(URL)

I got no error on this code but I'm not sure that it's work.
I need to download pictures from the address like
URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'

for that I try the code:

req = urllib.request.Request(URL, headers={'User-Agent': 'Mozilla/5.0'})        
htmltext = urllib.request.urlopen(req).read()
if htmltext is None:
    print("nada")
else:
    regex='<img src="/(.+?)"'
    pattern=re.compile(regex)
    imagem=pattern.findall(str(htmltext))
    print(imagem[0])
#descarrega imagem
import urllib.request
urllib.request.urlretrieve(URL+imagem[0], "local-filename.jpg")

But no luck. I got

Error:Traceback (most recent call last):
  File "C:\python-ficheiros\abrir.py", line 33, in <module>
    print(imagem[0])
IndexError: list index out of range

Any help on this matter?
Thank you

***metulburr*** · (This post was last modified: Feb-05-2018, 01:47 PM by metulburr.)

If you get an indexerror on the first element, then the list is empty. which means imagem = [] and your pattern is not matching. Dont use regex to parse html, use beautifulsoup as that is what it was made for.

https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2

you can download the image via requests

import shutil
import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)

mariolopes · Feb-05-2018, 04:47 PM

Thank you for your help, but I think the problem is that I can't login with Python
In my code:

import requests
import sys
import urllib.request
import re
import shutil
URL = 'https://www.cpcdi.pt/Account/Login'
def main():
    # Start a session so we can have persistant cookies
    #session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'CodCliente': 'xxx',
        'UserName': 'xx',
        'Password': 'xxxx',
        'submit':'submit',
    }
    # Authenticate
    r = session.post(URL, data=login_data)
    # Try accessing a page that requires you to be logged in
    URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'
    r = session.get(URL)
    print(r.text)

The last instrution returns nothing therefore I can't pass the login. I have the credentials and I can access with the browser but not with python. I think the submit button has no name and I don't know how to handle it.
Is that the best option to login with python?
regards

***metulburr*** · Feb-06-2018, 01:37 AM

Looking at the code, the blue "Entrar" button executes javascript. Which means you are going to need Selenium to click the button. The tutorial links I gave above for scraping websites has selenium examples in them too.

***snippsat*** · (This post was last modified: Feb-06-2018, 02:09 AM by snippsat.)

Here a setup you can look.
If new to this always start with webdriver that you see what's going on like Chrome.
So can go headless later.
I use CSS selector like #UserName that find the user name field.
It fill out all fields and push log in button,so if all where correct i'll be logged in.

from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Activate Phantom(headless) and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Chrome()
url = 'https://www.cpcdi.pt/Account/Login'
browser.get(url)
user_name = browser.find_element_by_css_selector('#CodCliente')
user_name.send_keys("Foo")
password = browser.find_element_by_css_selector('#UserName')
password.send_keys("Bar")
password = browser.find_element_by_css_selector('#Password')
password.send_keys("xxxxxxxxx")
time.sleep(5)
submit = browser.find_elements_by_css_selector('button.btn')
submit[0].click()
time.sleep(5)

# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
title = soup.select('head > title')
print(title[0].text)

mariolopes · Feb-06-2018, 10:10 AM

Great help
Many thanks for that. But there is some strange behavoir with my code. Please look at it

from selenium import webdriver
from bs4 import BeautifulSoup
import time
from urllib.request import urlopen
# Activate Phantom(headless) and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Firefox()
url = 'https://www.cpcdi.pt/Account/Login'
browser.get(url)
user_name = browser.find_element_by_css_selector('#CodCliente')
user_name.send_keys("111")
password = browser.find_element_by_css_selector('#UserName')
password.send_keys("1222")
password = browser.find_element_by_css_selector('#Password')
password.send_keys("11222")
time.sleep(5)
submit = browser.find_elements_by_css_selector('button.btn')
submit[0].click()
time.sleep(5)
 
# Give source code to BeautifulSoup
goUrl="https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA"
browser.get(goUrl)
soup = BeautifulSoup(urlopen(goUrl), 'lxml')
for a in soup.find_all('a', href=True):
    print ("Found the URL:", a['href'])

works fine the browser goes to the link but I get the links from the first page, not from the page where the browser is. What I'm doing wrong?

***metulburr*** · (This post was last modified: Feb-06-2018, 12:39 PM by metulburr.)

you dont need urlopen() if your using requests or selenium. Especially if you are using selenium to bypass javascript that urllib cannot bypass. You are basically opening a new page from python separate from selenium to pass to bs4 instead of the source from selenium

Quote:

soup = BeautifulSoup(urlopen(goUrl), 'lxml')

do this

soup = BeautifulSoup(browser.page_source, 'lxml')

PS you might need some form of delay after browser.get and page source but unsure til you check it out

mariolopes · Feb-07-2018, 09:47 AM

Simply perfect
Thank all.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Login and download an exported csv file within a ribbon/button in a website	Alekhya	0	3,670	Feb-26-2021, 04:15 PM Last Post: Alekhya
	Get element (ID) from website to login	Olimpiarob	4	5,839	Jul-28-2020, 01:20 PM Last Post: Martinelli
	Python Webscraping with a Login Website	warriordazza	0	3,395	Jun-07-2020, 07:04 AM Last Post: warriordazza
	Login to website	kapibara	4	4,730	Jul-27-2019, 02:40 PM Last Post: kapibara
	using webbot for website login fails	loeten	2	7,678	Jan-03-2019, 07:31 AM Last Post: loeten
	Using python requests module and BS4 to login on an Wordpress based website	apollo	1	10,451	Feb-06-2018, 01:31 AM Last Post: metulburr

Login and access website

User Panel Messages

Announcements