Dec-06-2017, 09:49 PM
hello dear python-experts,
i am apollo - pretty new to python. i have installed python on my opensuse leap 42.3. box:
afterwards i have installed bs4 - see below.
note - i did this as root. Hopefully this is correct.
To test if all went all right i tried the following:
i get the following lines back
sure thing - the above mentioned code is version 2 based.
but i guess that i have done some kind of misconfiguration in the installation of python?!?
probably i have choosen some wrong paths...
love to hear from you
i am apollo - pretty new to python. i have installed python on my opensuse leap 42.3. box:
afterwards i have installed bs4 - see below.
note - i did this as root. Hopefully this is correct.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
linux - 3645 : / home / martin # pip install --upgrade pip Collecting pip Downloading pip - 9.0 . 1 - py2.py3 - none - any .whl ( 1.3MB ) 100 % |████████████████████████████████| 1.3MB 155kB / s Installing collected packages: pip Found existing installation: pip 7.1 . 2 Uninstalling pip - 7.1 . 2 : Successfully uninstalled pip - 7.1 . 2 Successfully installed pip - 9.0 . 1 linux - 3645 : / home / martin # pip install beautifulsoup4 Collecting beautifulsoup4 Downloading beautifulsoup4 - 4.6 . 0 - py3 - none - any .whl ( 86kB ) 100 % |████████████████████████████████| 92kB 394kB / s Installing collected packages: beautifulsoup4 Successfully installed beautifulsoup4 - 4.6 . 0 linux - 3645 : / home / martin # |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import urllib from bs4 import BeautifulSoup import urlparse import mechanize # Set the startingpoint for the spider and initialize # the a mechanize browser object br = mechanize.Browser() # create lists for the urls in que and visited urls urls = [url] visited = [url] # Since the amount of urls in the list is dynamic # we just let the spider go until some last url didn't # have new ones on the webpage while len (urls)> 0 : try : br. open (urls[ 0 ]) urls.pop( 0 ) for link in br.links(): newurl = urlparse.urljoin(link.base_url,link.url) #print newurl if newurl not in visited and url in newurl: visited.append(newurl) urls.append(newurl) print newurl except : print "error" urls.pop( 0 ) |
1 2 3 4 5 6 |
martin@linux - 3645 :~ / dev / python> python p1.py Traceback (most recent call last): File "p1.py" , line 3 , in <module> from bs4 import BeautifulSoup ImportError: No module named bs4 martin@linux - 3645 :~ / dev / python> |
but i guess that i have done some kind of misconfiguration in the installation of python?!?
probably i have choosen some wrong paths...
love to hear from you