Python Forum

Pages: 1 2

I am not sure what this error is.
here is the line of code it is talking about:

for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):

Here it the whole error:

Traceback (most recent call last):
File "C:\Users\renny and kite\Desktop\web scraping the book\example_11\example
_11\example_11.py", line 15, in <module>
for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
NameError: name 'bsObj' is not defined
Press any key to continue . . .

How is the name not defined? Huh

please, post the whole code, exactly as you try to run it.
The error is clear bsObj is not defined at the time when you try to use it in that line.

We cant tell because the rest of the code where it has not been defined is not shown.

(Oct-22-2016, 07:24 PM)Blue Dog Wrote: [ -> ]How is the name not defined?

That's what you would have to go back through your code and figure out. The error is telling you that when Python gets to that line of code, it has not seen the name bsObj before, at least in that scope. When I get that error, one of three things has happened: I mistyped the variable name when it was first mentioned, some if/elif logic skipped over the first mention of that name, or I didn't scope it correctly (it should be self.bsObj or something).

Here is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def getLinks(pageUrl):
 global pages
 html = urlopen("http://en.wikipedia.org"+pageUrl)
 try:
    print(bsObj.h1.get_text())
    print(bsObj.find(id ="mw-content-text").findAll("p")[0])
    print(bsObj.find(id="ca-edit").find("span").find("a").attrs['href'])
 except AttributeError:
   
    print("This page is missing something! No worries though!")
for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
   if 'href' in link.attrs:
       if link.attrs['href'] not in pages:
#We have encountered a new page
         newPage = link.attrs['href']
         print("----------------\n"+newPage)
         pages.add(newPage)
         getLinks(newPage)
getLinks("")

That it, I found this on the web and change some of it, the findAll is defined I think = print(bsObj.find(id ="mw-content-text").findAll("p")[0])

Thank you Dodgy

Go back and look at the code you copied and find where it is you changed the definition of bsObj out of the code.

Fix indention the code is a mess.
The first error is because no soup object is defined

bsObj = BeautifulSoup(html, 'html.parser')

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
pages = set()
def getLinks(pageUrl):
 global pages
 html = urlopen("http://en.wikipedia.org"+pageUrl)
 try:
    print(bsObj.h1.get_text())
    print(bsObj.find(id ="mw-content-text").findAll("p")[0])
    print(bsObj.find(id="ca-edit").find("span").find("a").attrs['href'])
 except AttributeError:
    
    print("This page is missing something! No worries though!")
    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
       if 'href' in link.attrs:
           if link.attrs['href'] not in pages:
#We have encountered a new page
         newPage = link.attrs['href']
         print("----------------\n"+newPage)
         pages.add(newPage)
         getLinks(newPage)
getLinks("")

Is that better, still will not work, but I am have for the problem

It looks like all you have done is changed the indentation and still left out the bsObj

indentation still wrong (hint line 17) I presume you are nesting the 'if' statement

Pages: 1 2

Blue Dog

buran

Yoriz

ichabod801

Blue Dog

Yoriz

snippsat

Blue Dog

Yoriz

sparkz_alot