![]() |
Python SSL web page scraping - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Python SSL web page scraping (/thread-39177.html) |
Python SSL web page scraping - Vadanane - Jan-13-2023 I'm using Python 2.7 with BeautifulSoup to scrape web pages, but I keep running across protocol errors that don't make much sense to me. This only occurs on the particular website for which I need to do this: https://edd.telstra.com/telstra The code I use only for fundamental testing #! /usr/bin/python from urllib import urlopen from BeautifulSoup import BeautifulSoup import re # Copy all of the content from the provided web page webpage = urlopen("https://edd.telstra.com/telstra/").read()And I get the following error (running on Ubuntu 12.10): Traceback (most recent call last): File "e.py", line 8, in <module> webpage = urlopen("https://edd.telstra.com/telstra/").read() File "/usr/lib/python2.7/urllib.py", line 86, in urlopen return opener.open(url) File "/usr/lib/python2.7/urllib.py", line 207, in open return getattr(self, name)(url) File "/usr/lib/python2.7/urllib.py", line 436, in open_https h.endheaders(data) File "/usr/lib/python2.7/httplib.py", line 958, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 818, in _send_output self.send(msg) File "/usr/lib/python2.7/httplib.py", line 780, in send self.connect() File "/usr/lib/python2.7/httplib.py", line 1165, in connect self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file) File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket ciphers=ciphers) File "/usr/lib/python2.7/ssl.py", line 143, in __init__ self.do_handshake() File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake self._sslobj.do_handshake() IOError: [Errno socket error] [Errno 1] _ssl.c:504: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record macCould someone tell me if there is some parameter that I need to specify to get this page to download in Python? It seems that this is the problem just on this web page as the code above (plus lots of other code I tried) works fine on other HTTPS/SSL pages I tried. Thanks for any help! RE: Python SSL web page scraping - snippsat - Jan-13-2023 There are severals big problems here. Should not be needed now to say that you should be using Python 2.7,it has been dead💀 for 3-years now. For the scraing part the url leads to a login page and not the main page,so copy content will only copy the login page. So this is far away from working,to give some hint i would use Selenuim to do login to get to main page. from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from bs4 import BeautifulSoup import time #--| Setup options = Options() #options.add_argument("--headless") ser = Service(r"C:\cmder\bin\chromedriver.exe") browser = webdriver.Chrome(service=ser, options=options) #--| Parse or automation url = 'https://edd.telstra.com/telstra/' browser.get(url) browser.implicitly_wait(2) user_name = browser.find_element(By.CSS_SELECTOR, '#Username') user_name.send_keys('login_name') bus_id = browser.find_element(By.CSS_SELECTOR, '#Business\ ID') bus_id.send_keys('999')So start like this then push login button to get to main page,then copy content may work or not at all.It depends what main page data has available without requesting something. ![]() |