Circumvent the "access denied" page?

Pedroski55 · Jun-13-2024, 10:53 AM

Is there a way to circumvent the access denied page?

I am trying to find importers of agricultural chemicals in various countries for my girlfriend, so she can contact them.

import requests
from bs4 import BeautifulSoup

mylink = "https://www.distrilist.eu/cis/russia/39-import-export-companies-in-russia/"
res = requests.get(mylink)

I get:

Output:res.text
'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don\'t have permission to access this resource.</p>\n</body></html>\n'

I can open the webpage normally on my computer and see the table I want to get.

I can open the source code and copy and paste the data I want, but it would be nicer with BeautifulSoup! Put the output in a pandas dataframe and export to Excel!

***snippsat*** · (This post was last modified: Jun-13-2024, 04:35 PM by snippsat.)

Add User Agent.

import requests
from bs4 import BeautifulSoup

mylink = "https://www.distrilist.eu/cis/russia/39-import-export-companies-in-russia/"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(mylink, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.select_one('header > h1'))

Output:
<h1 class="entry-title">39 Import Export Companies in Russia</h1>

DeaD_EyE · (This post was last modified: Jun-13-2024, 05:12 PM by DeaD_EyE.)

import csv
import socket

# external dependency
import requests
from bs4 import BeautifulSoup



proxies = {}

# do not use this code
try:
    import socks
    # if you want to use tor via socks, then install pysocks
    # the import name is socks, but via pip you have to install pysocks
    # or install requests[socks]
    # the square brackets are used often for additional dependencies for
    # packages
except ImportError:
    print("Could not use pysocks, so TOR could not used as a proxy")
    # not using proxis if socks is not installed
else:
    with socket.socket() as sock:
        sock.settimeout(1)

        try:
            sock.connect(("127.0.0.1", 9050))
        except (TimeoutError, ConnectionError):
            print("TOR service seems not running")
        else:
            # using socks5 and the h signals the use of the dns provided via tor
            proxies = {"http": "socks5h://127.0.0.1:9050"}
            proxies["https"] = proxies["http"]
# end of tor

def get_export_companies():
    url = "https://www.distrilist.eu/cis/russia/39-import-export-companies-in-russia/"
    
    # some webservers disallows the access if no valid User-Agent were send
    headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0"}
    response = requests.get(url, headers=headers, proxies=proxies)

    # parse the raw content (bytes) of the response with bs4
    doc = BeautifulSoup(response.content, "html.parser")
    
    # table_header is the first element
    # table_rows is a list of table rows
    table_header, *table_rows = doc.find_all("tr")
    
    # yield the header as title
    # inside the tuple is a generator exression to get the text and
    # calling the method title on the str
    yield tuple(td.text.title() for td in table_header.find_all("td"))

    # yield the table rows
    for table_row in table_rows:
        # yield a tuple with the table data
        # inside the tuple is a generator exression to get the text
        yield tuple(table_data.text for table_data in table_row.find_all("td"))


def save_csv(file):
    with open(file, "w", encoding="utf8") as fd:
        csv.writer(fd).writerows(get_export_companies())
        

save_csv("export_companies.csv")

Pedroski55 · Jun-14-2024, 06:20 AM

Thanks both of you!

I did figure out the bit with User Agent in the end:

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(mylink, headers=headers)

This morning another, related problem. Again, I can open the webpage in my browser and see the page source code and the information I want to save, but I am getting the following error when I try to get the page with requests:

Quote:raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.schneider-group.com', port=443): Max retries exceeded with url: /en/about/contacts/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))

I there anything I can do about this error?

companyweb = "https://www.schneider-group.com/en/about/contacts/"
result = requests.get(companyweb, headers=headers)

The contact page has various addresses, emails and phone numbers of offices in West and East Europe.

DeaD_EyE · Jun-14-2024, 07:58 AM

Try this:

result = requests.get(companyweb, verfify=False)

But this is only a workaround. Usually requests installs all required ca-certificates as a bundle to verify the certs from the webserver. If the verification is disabled, then there is no check.

Pedroski55 · Jun-14-2024, 11:21 AM

@DeaD_EyE Thanks again!

Quote:result = requests.get(companyweb, verify=False)

Warning (from warnings module):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1020
warnings.warn(
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.schneider-group.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest...l-warnings

Oh well, I will try the next company!

***snippsat*** · (This post was last modified: Jun-14-2024, 05:08 PM by snippsat.)

Using verify=False will get warning this is not a stopping error,can still parse fine.
Can also use eg Selenium then no warning,and want parse JavaScript content then it also work.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

# Setup
# https://storage.googleapis.com/chrome-for-testing-public/125.0.6422.141/win64/chromedriver-win64.zip
options = Options()
options.add_argument("--headless=new")
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
# Parse or automation
url = "https://www.schneider-group.com/en/about/contacts/"
browser.get(url)
Armenia = browser.find_element(By.CSS_SELECTOR, '#bx_3218110189_13 > p')
print(Armenia.text)

Output:
Business Center "Yerevan Plaza", Grigor Lusavorich str. 9, Yerevan, 0015, Armenia

Pedroski55 · Jun-15-2024, 06:25 AM

@snippsat Thanks!

You are right, this

result = requests.get(companyweb, headers=headers, verify=False)

got me the warning:

Output:Warning (from warnings module):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1020
    warnings.warn(
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.schneider-group.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

but also:

Output:result
<Response [200]>
result.text
Squeezed text (545 lines)

Thanks for the selenium tip, I will try it!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	sharepoint: Access has been blocked by Conditional Access policies	CAD79	0	2,090	Jul-12-2024, 09:36 AM Last Post: CAD79
	The INSERT permission was denied on the object	Steven5055	3	2,934	Jun-12-2024, 08:13 AM Last Post: GregoryConley
	access is denied error 5 for network drive mapping ?	ahmedbarbary	2	2,765	Aug-17-2022, 10:09 PM Last Post: ahmedbarbary
	Server Folder Error : WinError5 Access Denied	fioranosnake	1	1,708	Jun-21-2022, 11:11 PM Last Post: Larz60+
	Error no 13: Permission denied in python	shantanu97	1	8,538	Mar-31-2021, 02:15 PM Last Post: snippsat
	Access denied not able to do anything 8( What do i do?	MurphysLaw	3	5,086	Oct-20-2020, 08:16 AM Last Post: snippsat
	os.remove() Access is denied	pythonnewbie138	3	40,253	Aug-28-2020, 10:02 PM Last Post: bowlofred
	Fixing "PermissionError: [Errno 13] Permission denied"	puredata	17	88,636	Mar-09-2020, 03:20 PM Last Post: syssy
	Has anyone experience a winError[5] Access Denied in Windows 10?	fstkmaro	2	16,388	Nov-11-2019, 02:38 PM Last Post: fstkmaro
	PermissionError: [Errno 13] Permission denied: error	leviathan54	2	49,273	Apr-20-2019, 12:51 AM Last Post: leviathan54

Circumvent the "access denied" page?

User Panel Messages

Announcements