Python Forum

Full Version: NameError: name 'bsObj' is not defined
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I am not sure what this error is.
here is the line of code it is talking about:

for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):

Here it the whole error:

Traceback (most recent call last):
  File "C:\Users\renny and kite\Desktop\web scraping the book\example_11\example
_11\example_11.py", line 15, in <module>
    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
NameError: name 'bsObj' is not defined
Press any key to continue . . .

How is the name not defined? Huh
please, post the whole code, exactly as you try to run it.
The error is clear bsObj is not defined at the time when you try to use it in that line.
We cant tell because the rest of the code where it has not been defined is not shown.
(Oct-22-2016, 07:24 PM)Blue Dog Wrote: [ -> ]How is the name not defined?

That's what you would have to go back through your code and figure out. The error is telling you that when Python gets to that line of code, it has not seen the name bsObj before, at least in that scope. When I get that error, one of three things has happened: I mistyped the variable name when it was first mentioned, some if/elif logic skipped over the first mention of that name, or I didn't scope it correctly (it should be self.bsObj or something).
Here is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def getLinks(pageUrl):
 global pages
 html = urlopen("http://en.wikipedia.org"+pageUrl)
 try:
    print(bsObj.h1.get_text())
    print(bsObj.find(id ="mw-content-text").findAll("p")[0])
    print(bsObj.find(id="ca-edit").find("span").find("a").attrs['href'])
 except AttributeError:
   
    print("This page is missing something! No worries though!")
for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
   if 'href' in link.attrs:
       if link.attrs['href'] not in pages:
#We have encountered a new page
         newPage = link.attrs['href']
         print("----------------\n"+newPage)
         pages.add(newPage)
         getLinks(newPage)
getLinks("")
That it, I found this on the web and change some of it, the findAll is defined I think = print(bsObj.find(id ="mw-content-text").findAll("p")[0])

Thank you Dodgy
Go back and look at the code you copied and find where it is you changed the definition of bsObj out of the code.
Fix indention the code is a mess.
The first error is because no soup object is defined
bsObj = BeautifulSoup(html, 'html.parser')
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def getLinks(pageUrl):
 global pages
 html = urlopen("http://en.wikipedia.org"+pageUrl)
 try:
    print(bsObj.h1.get_text())
    print(bsObj.find(id ="mw-content-text").findAll("p")[0])
    print(bsObj.find(id="ca-edit").find("span").find("a").attrs['href'])
 except AttributeError:
    
    print("This page is missing something! No worries though!")
    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
       if 'href' in link.attrs:
           if link.attrs['href'] not in pages:
#We have encountered a new page
         newPage = link.attrs['href']
         print("----------------\n"+newPage)
         pages.add(newPage)
         getLinks(newPage)
getLinks("")
Is that better, still will not work, but I am have for the problem
It looks like all you have done is changed the indentation and still left out the bsObj
indentation still wrong (hint line 17) I presume you are nesting the 'if' statement
Pages: 1 2