Error while scraping links with beautiful soup - mgtheboss - Dec-22-2017
I am getting error while scraping links with beautiful soup. Here is the program.
from urllib.request import urlopen
from bs4 import BeautifulSoup
websitecode = urlopen("https://www.google.com").read()
soup=BeautifulSoup(websitecode, "html.parser")
links=soup.findAll("a")
print(links) Here is the error.
Error: Traceback (most recent call last):
File "test.py", line 7, in <module>
print(links)
File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1308-131
3: character maps to <undefined>
I am using python 3.4.3. I appreciate the forum's cooperation.
The issue has been resolved with requests and encode("utf-8") .
RE: Error while scraping links with beautiful soup - Larz60+ - Dec-22-2017
try this:
websitecode = urlopen("https://www.google.com").read().decode('utf8')
RE: Error while scraping links with beautiful soup - mgtheboss - Dec-22-2017
@Larz60+
Error: Traceback (most recent call last):
File "test.py", line 3, in <module>
websitecode = urlopen("https://www.google.com").read().decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 11035: inva
lid start byte
RE: Error while scraping links with beautiful soup - Larz60+ - Dec-22-2017
which version of python are you running, It works for me using python 3.6.4
from urllib.request import urlopen
from bs4 import BeautifulSoup
websitecode = urlopen("https://www.google.com").read()
soup=BeautifulSoup(websitecode, "html.parser")
links=soup.findAll("a")
print(links) results:
Output: [<a class="gb1" href="https://www.google.com/imghp?hl=en&tab=wi">Images</a>, <a class="gb1" href="https://maps.google.com/maps?hl=en&tab=wl">Maps</a>, <a class="gb1" href="https://play.google.com/?hl=en&tab=w8">Play</a>, <a class="gb1" href="https://www.youtube.com/?gl=US&tab=w1">YouTube</a>, <a class="gb1" href="https://news.google.com/nwshp?hl=en&tab=wn">News</a>, <a class="gb1" href="https://mail.google.com/mail/?tab=wm">Gmail</a>, <a class="gb1" href="https://drive.google.com/?tab=wo">Drive</a>, <a class="gb1" href="https://www.google.com/intl/en/options/" style="text-decoration:none"><u>More</u> »</a>, <a class="gb4" href="http://www.google.com/history/optout?hl=en">Web History</a>, <a class="gb4" href="/preferences?hl=en">Settings</a>, <a class="gb4" href="https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/" id="gb_70" target="_top">Sign in</a>, <a href="/search?site=&ie=UTF-8&q=Winter+Solstice+2017&oi=ddle&ct=winter-solstice-2017-northern-hemisphere-5368746536337408-law&hl=en&kgmid=/m/02qjfw4&sa=X&ved=0ahUKEwi697Wny53YAhWMS98KHamFAOsQPQgD"><img alt="Winter Solstice 2017" border="0" height="211" id="hplogo" onload="window.lol&&lol()" src="/logos/doodles/2017/winter-solstice-2017-northern-hemisphere-5368746536337408-law.gif" title="Winter Solstice 2017" width="500"/><br/></a>, <a href="/advanced_search?hl=en&authuser=0">Advanced search</a>, <a href="/language_tools?hl=en&authuser=0">Language tools</a>, <a href="/intl/en/ads/">Advertising Programs</a>, <a href="/services/">Business Solutions</a>, <a href="https://plus.google.com/116899029375914044550" rel="publisher">+Google</a>, <a href="/intl/en/about.html">About Google</a>, <a href="/intl/en/policies/privacy/">Privacy</a>, <a href="/intl/en/policies/terms/">Terms</a>]
RE: Error while scraping links with beautiful soup - mgtheboss - Dec-22-2017
@Larz60+ 3.4.3. It's a 3.4.3 version issue it seems. I appreciate your cooperation.
|