Python Forum

Full Version: Traceback error in PyCharm but not in other IDEs?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm wanting to switch to PyCharm but I'm getting a Traceback error with the following code that I haven't gotten before in other IDEs. I'm running v3.8.2 on a Mac. Any help would be greatly appreciated!

# import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup

# specify the url
url = "https://www.bbc.com/sport/football/46897172"

# Connect to the website and return the html to the variable ‘page’
try:
    page = urlopen(url)
except:
    print("Error opening the URL")

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')

# Take out the <div> of name and get its value
content = soup.find('div', {"class": "story-body sp-story-body gel-body-copy"})

article = ''
for i in content.findAll('p'):
    article = article + ' ' + i.text
print(article)

# Saving the scraped text
with open('scraped_text.txt', 'w') as file:
    file.write(article)
Error:
Traceback (most recent call last): File "<input>", line 1, in <module> File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/dv/Desktop/Test/test.py", line 15, in <module> soup = BeautifulSoup(your_mom, 'html.parser') NameError: name 'page' is not defined
remove the general except on line 11, so that you can get what exactly the error is. At the moment you hide the actual error with that all-catching except.

Also note that the error does not match the code provided.
on line #15 you have
soup = BeautifulSoup(page, 'html.parser')
while the error refer to
soup = BeautifulSoup(your_mom, 'html.parser')
note page vs your_nom
Thanks for the reply. I accidentally had two variable names in there because I was swapping them out thinking "page" might be a constant in Python. Anyway, I ran the code with "page" consistent throughout and removed "except:" from line 11 as well:

from urllib.request import urlopen
from bs4 import BeautifulSoup

# specify the url
url = "https://www.bbc.com/sport/football/46897172"

# Connect to the website and return the html to the variable ‘page’
try:
    page = urlopen(url)

    print("Error opening the URL")

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')

# Take out the <div> of name and get its value
content = soup.find('div', {"class": "story-body sp-story-body gel-body-copy"})

article = ''
for i in content.findAll('p'):
    article = article + ' ' + i.text
print(article)

# Saving the scraped text
with open('scraped_text.txt', 'w') as file:
    file.write(article)



Here's the new error:

Error:
Traceback (most recent call last): File "<input>", line 1, in <module> File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/dv/Desktop/Test/test.py", line 15 soup = BeautifulSoup(page, 'html.parser') ^ SyntaxError: invalid syntax
you cannot remove just the except. remove also the try and fix the indentation
Doing that threw a crazy long error – in a nutshell:
Error:
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]
Thanks to that new Traceback error though, I was able to find a fix on Stack Overflow. The fix was enabled by double clicking on the 'Install Certificates.command' in my Python install folder. Not really sure what the issue was, but after doing that the error was gone.

Thanks again for your help – it's greatly appreciated.
Some advice drop urllib and use Requests.
Here how that code would look with some advice,more here in tutorial part-1.
import requests
from bs4 import BeautifulSoup

url = "https://www.bbc.com/sport/football/46897172"
response = requests.get(url)
# lxml is faster parser
soup = BeautifulSoup(response.content, 'lxml')
# Now do not use dict call,easier to copy from page and just add one _ after class
content = soup.find('div', class_="story-body sp-story-body gel-body-copy")

with open('scraped_text.txt', 'w') as f_out:
    for p_tag in content.find_all('p'):
        print(p_tag.text)
        f_out.write(f'{p_tag.text}\n')
Thanks for the Requests recommendation and tutorial -- this is a much better solution!
@0ba22b85af, you understand that OP problem has been resolved like 1 week ago, right? And posting spam is NOT tolerated - the link in the post has been removed.