Cant post my code because I'm new and it has links?
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = "http://www.pythonforbeginners.com"
content = urlopen(url).read()
soup = BeautifulSoup(content)
print( soup.prettify())
#print(title)
#>> 'title'? Python For Beginners
#print soup.title.string
#>> ? Python For Beginners
print(soup.p)
remove the links.
You'll be able to use them after you make a few posts
Error:
Traceback (most recent call last):
File "/home/ubuntu/workspace/ex50/bin/firstsoup.py", line 1, in <module>
from bs4 import BeautifulSoup
ImportError: No module named 'bs4'
Process exited with code: 1
Sorry for the multiple reply chunks- bs4 is definitely installed, so I don't know what I'm doing wrong. Apologize in advance for my newbie/idiot status!
(Jun-26-2017, 06:40 PM)datafix Wrote: [ -> ]Sorry for the multiple reply chunks- bs4 is definitely installed,
It's not installed when you get that message.
pip3 install beautifulsoup4
You execute with
python3 firstsoup.py
Look at my
Linux Python 3 environment,
and your script is wrong in newer BeautifulSoup has to call a parser and don't use urllib look at
Web-Scraping part-1
I thought I was running v3 but in fact i was running v2. So I fixed that, reran it with v3 and got the same import error. Running on Cloud9- that shouldn't make a difference- do you think?
(Jun-27-2017, 06:34 PM)datafix Wrote: [ -> ]Running on Cloud9- that shouldn't make a difference- do you think?
I use
Cloud9 sometime myself.
Cloud9 has Python 2 and 3 installed,you have to
pip3 install module
,
and run code with
python3 myscript.py
Here a run at Cloud9.
# install modules
snippsat:~/workspace/bs4_test $ sudo pip3 install beautifulsoup4 requests lxml
Downloading/unpacking beautifulsoup4
Downloading beautifulsoup4-4.6.0-py3-none-any.whl (86kB): 86kB downloaded
Requirement already satisfied (use --upgrade to upgrade): requests in /usr/lib/python3/dist-packages
Downloading/unpacking lxml
Downloading lxml-3.8.0.tar.gz (3.8MB): 3.8MB downloaded
Running setup.py (path:/tmp/pip_build_root/lxml/setup.py) egg_info for package lxml
Building lxml version 3.8.0.
Building without Cython.
Using build configuration of libxslt 1.1.28
Successfully installed beautifulsoup4 lxml
Cleaning up...
# After install can run a interactive test
snippsat:~/workspace/bs4_test $ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = 'https://www.python.org/'
>>> url_get = requests.get(url)
>>> soup = BeautifulSoup(url_get.content, 'lxml')
>>> print(soup.select('head > title')[0].text)
Welcome to Python.org
>>>
Thanks so much for all of the feedback. I was off on another project, but I'm back. I'll try out these suggestions and let you know how it worked.
@
snippsat - Thanks! Your last post did the trick.
