Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Login and access website
#1
Hi
I want to login on this website

https://www.cpcdi.pt/Account/Login
I have the credentials and I try the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests
import sys
import urllib.request
import re
def main():
    # Start a session so we can have persistant cookies
    #session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'CodCliente': 'mycode',
        'UserName': 'myuser',
        'Password': 'mypass',
        'submit':'submit',
    }
 
    # Authenticate
    r = session.post(URL, data=login_data)
 
    # Try accessing a page that requires you to be logged in
    r = session.get(URL)
I got no error on this code but I'm not sure that it's work.
I need to download pictures from the address like
URL='https://www.cpcdi.pt/Produtos/Referencia?referencia=8745B006AA'

for that I try the code:
1
2
3
4
5
6
7
8
9
10
11
12
req = urllib.request.Request(URL, headers={'User-Agent': 'Mozilla/5.0'})       
htmltext = urllib.request.urlopen(req).read()
if htmltext is None:
    print("nada")
else:
    regex='<img src="/(.+?)"'
    pattern=re.compile(regex)
    imagem=pattern.findall(str(htmltext))
    print(imagem[0])
#descarrega imagem
import urllib.request
urllib.request.urlretrieve(URL+imagem[0], "local-filename.jpg")
But no luck. I got
Error:
Traceback (most recent call last): File "C:\python-ficheiros\abrir.py", line 33, in <module> print(imagem[0]) IndexError: list index out of range
Any help on this matter?
Thank you
Reply
#2
If you get an indexerror on the first element, then the list is empty. which means imagem = [] and your pattern is not matching. Dont use regex to parse html, use beautifulsoup as that is what it was made for.

https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2

you can download the image via requests
1
2
3
4
5
6
7
import shutil
import requests
 
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
Recommended Tutorials:
Reply
#3
Thank you for your help, but I think the problem is that I can't login with Python
In my code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests
import sys
import urllib.request
import re
import shutil
def main():
    # Start a session so we can have persistant cookies
    #session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'CodCliente': 'xxx',
        'UserName': 'xx',
        'Password': 'xxxx',
        'submit':'submit',
    }
    # Authenticate
    r = session.post(URL, data=login_data)
    # Try accessing a page that requires you to be logged in
    r = session.get(URL)
    print(r.text)
The last instrution returns nothing therefore I can't pass the login. I have the credentials and I can access with the browser but not with python. I think the submit button has no name and I don't know how to handle it.
Is that the best option to login with python?
regards
Reply
#4
Looking at the code, the blue "Entrar" button executes javascript. Which means you are going to need Selenium to click the button. The tutorial links I gave above for scraping websites has selenium examples in them too.
Recommended Tutorials:
Reply
#5
Here a setup you can look.
If new to this always start with webdriver that you see what's going on like Chrome.
So can go headless later.
I use CSS selector like #UserName that find the user name field.
It fill out all fields and push log in button,so if all where correct i'll be logged in.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from selenium import webdriver
from bs4 import BeautifulSoup
import time
 
# Activate Phantom(headless) and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Chrome()
browser.get(url)
user_name = browser.find_element_by_css_selector('#CodCliente')
user_name.send_keys("Foo")
password = browser.find_element_by_css_selector('#UserName')
password.send_keys("Bar")
password = browser.find_element_by_css_selector('#Password')
password.send_keys("xxxxxxxxx")
time.sleep(5)
submit = browser.find_elements_by_css_selector('button.btn')
submit[0].click()
time.sleep(5)
 
# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
title = soup.select('head > title')
print(title[0].text)
Reply
#6
Great help
Many thanks for that. But there is some strange behavoir with my code. Please look at it
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from selenium import webdriver
from bs4 import BeautifulSoup
import time
from urllib.request import urlopen
# Activate Phantom(headless) and deactivate Chrome to not load browser
#browser = webdriver.PhantomJS()
browser = webdriver.Firefox()
browser.get(url)
user_name = browser.find_element_by_css_selector('#CodCliente')
user_name.send_keys("111")
password = browser.find_element_by_css_selector('#UserName')
password.send_keys("1222")
password = browser.find_element_by_css_selector('#Password')
password.send_keys("11222")
time.sleep(5)
submit = browser.find_elements_by_css_selector('button.btn')
submit[0].click()
time.sleep(5)
  
# Give source code to BeautifulSoup
browser.get(goUrl)
soup = BeautifulSoup(urlopen(goUrl), 'lxml')
for a in soup.find_all('a', href=True):
    print ("Found the URL:", a['href'])
works fine the browser goes to the link but I get the links from the first page, not from the page where the browser is. What I'm doing wrong?
Reply
#7
you dont need urlopen() if your using requests or selenium. Especially if you are using selenium to bypass javascript that urllib cannot bypass. You are basically opening a new page from python separate from selenium to pass to bs4 instead of the source from selenium

Quote:
1
soup = BeautifulSoup(urlopen(goUrl), 'lxml')

do this
1
soup = BeautifulSoup(browser.page_source, 'lxml')
PS you might need some form of delay after browser.get and page source but unsure til you check it out
Recommended Tutorials:
Reply
#8
Simply perfect
Thank all.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Login and download an exported csv file within a ribbon/button in a website Alekhya 0 3,535 Feb-26-2021, 04:15 PM
Last Post: Alekhya
  Get element (ID) from website to login Olimpiarob 4 5,692 Jul-28-2020, 01:20 PM
Last Post: Martinelli
  Python Webscraping with a Login Website warriordazza 0 3,315 Jun-07-2020, 07:04 AM
Last Post: warriordazza
  Login to website kapibara 4 4,613 Jul-27-2019, 02:40 PM
Last Post: kapibara
  using webbot for website login fails loeten 2 7,506 Jan-03-2019, 07:31 AM
Last Post: loeten
  Using python requests module and BS4 to login on an Wordpress based website apollo 1 10,379 Feb-06-2018, 01:31 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020