Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Login and access website
#1
Hi
I want to login on this website

https://www.cpcdi.pt/Account/Login
I have the credentials and I try the following code:

import requests
import sys
import urllib.request
import re
URL = 'https://www.cpcdi.pt/Account/Login'
def main():
    # Start a session so we can have persistant cookies
    #session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'CodCliente': 'mycode',
        'UserName': 'myuser',
        'Password': 'mypass',
        'submit':'submit',
    }

    # Authenticate
    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'
    r = session.get(URL)
I got no error on this code but I'm not sure that it's work.
I need to download pictures from the address like
URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'

for that I try the code:
req = urllib.request.Request(URL, headers={'User-Agent': 'Mozilla/5.0'})        
htmltext = urllib.request.urlopen(req).read()
if htmltext is None:
    print("nada")
else:
    regex='<img src="/(.+?)"'
    pattern=re.compile(regex)
    imagem=pattern.findall(str(htmltext))
    print(imagem[0])
#descarrega imagem
import urllib.request
urllib.request.urlretrieve(URL+imagem[0], "local-filename.jpg")
But no luck. I got
Error:
Traceback (most recent call last): File "C:\python-ficheiros\abrir.py", line 33, in <module> print(imagem[0]) IndexError: list index out of range
Any help on this matter?
Thank you
Reply
#2
If you get an indexerror on the first element, then the list is empty. which means imagem = [] and your pattern is not matching. Dont use regex to parse html, use beautifulsoup as that is what it was made for.

https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2

you can download the image via requests
import shutil
import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
Recommended Tutorials:
Reply
#3
Thank you for your help, but I think the problem is that I can't login with Python
In my code:
import requests
import sys
import urllib.request
import re
import shutil
URL = 'https://www.cpcdi.pt/Account/Login'
def main():
    # Start a session so we can have persistant cookies
    #session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'CodCliente': 'xxx',
        'UserName': 'xx',
        'Password': 'xxxx',
        'submit':'submit',
    }
    # Authenticate
    r = session.post(URL, data=login_data)
    # Try accessing a page that requires you to be logged in
    URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'
    r = session.get(URL)
    print(r.text)
The last instrution returns nothing therefore I can't pass the login. I have the credentials and I can access with the browser but not with python. I think the submit button has no name and I don't know how to handle it.
Is that the best option to login with python?
regards
Reply
#4
Looking at the code, the blue "Entrar" button executes javascript. Which means you are going to need Selenium to click the button. The tutorial links I gave above for scraping websites has selenium examples in them too.
Recommended Tutorials:
Reply
#5
Here a setup you can look.
If new to this always start with webdriver that you see what's going on like Chrome.
So can go headless later.
I use CSS selector like #UserName that find the user name field.
It fill out all fields and push log in button,so if all where correct i'll be logged in.
from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Activate Phantom(headless) and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Chrome()
url = 'https://www.cpcdi.pt/Account/Login'
browser.get(url)
user_name = browser.find_element_by_css_selector('#CodCliente')
user_name.send_keys("Foo")
password = browser.find_element_by_css_selector('#UserName')
password.send_keys("Bar")
password = browser.find_element_by_css_selector('#Password')
password.send_keys("xxxxxxxxx")
time.sleep(5)
submit = browser.find_elements_by_css_selector('button.btn')
submit[0].click()
time.sleep(5)

# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
title = soup.select('head > title')
print(title[0].text)
Reply
#6
Great help
Many thanks for that. But there is some strange behavoir with my code. Please look at it
from selenium import webdriver
from bs4 import BeautifulSoup
import time
from urllib.request import urlopen
# Activate Phantom(headless) and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Firefox()
url = 'https://www.cpcdi.pt/Account/Login'
browser.get(url)
user_name = browser.find_element_by_css_selector('#CodCliente')
user_name.send_keys("111")
password = browser.find_element_by_css_selector('#UserName')
password.send_keys("1222")
password = browser.find_element_by_css_selector('#Password')
password.send_keys("11222")
time.sleep(5)
submit = browser.find_elements_by_css_selector('button.btn')
submit[0].click()
time.sleep(5)
 
# Give source code to BeautifulSoup
goUrl="https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA"
browser.get(goUrl)
soup = BeautifulSoup(urlopen(goUrl), 'lxml')
for a in soup.find_all('a', href=True):
    print ("Found the URL:", a['href'])
works fine the browser goes to the link but I get the links from the first page, not from the page where the browser is. What I'm doing wrong?
Reply
#7
you dont need urlopen() if your using requests or selenium. Especially if you are using selenium to bypass javascript that urllib cannot bypass. You are basically opening a new page from python separate from selenium to pass to bs4 instead of the source from selenium

Quote:
soup = BeautifulSoup(urlopen(goUrl), 'lxml')

do this
soup = BeautifulSoup(browser.page_source, 'lxml')
PS you might need some form of delay after browser.get and page source but unsure til you check it out
Recommended Tutorials:
Reply
#8
Simply perfect
Thank all.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Login and download an exported csv file within a ribbon/button in a website Alekhya 0 2,617 Feb-26-2021, 04:15 PM
Last Post: Alekhya
  Get element (ID) from website to login Olimpiarob 4 4,193 Jul-28-2020, 01:20 PM
Last Post: Martinelli
  Python Webscraping with a Login Website warriordazza 0 2,571 Jun-07-2020, 07:04 AM
Last Post: warriordazza
  Login to website kapibara 4 3,492 Jul-27-2019, 02:40 PM
Last Post: kapibara
  using webbot for website login fails loeten 2 6,357 Jan-03-2019, 07:31 AM
Last Post: loeten
  Using python requests module and BS4 to login on an Wordpress based website apollo 1 9,371 Feb-06-2018, 01:31 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020