Python Forum
Error while scraping links with beautiful soup
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Error while scraping links with beautiful soup
#1
I am getting error while scraping links with beautiful soup. Here is the program.

from urllib.request import urlopen
from bs4 import BeautifulSoup
websitecode = urlopen("https://www.google.com").read()
soup=BeautifulSoup(websitecode, "html.parser")
links=soup.findAll("a")
print(links)
Here is the error.

Error:
Traceback (most recent call last): File "test.py", line 7, in <module> print(links) File "C:\Python34\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 1308-131 3: character maps to <undefined>
I am using python 3.4.3. I appreciate the forum's cooperation.

The issue has been resolved with requests and encode("utf-8").
Reply
#2
try this:

websitecode = urlopen("https://www.google.com").read().decode('utf8')
Reply
#3
@Larz60+

Error:
Traceback (most recent call last): File "test.py", line 3, in <module> websitecode = urlopen("https://www.google.com").read().decode('utf8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 11035: inva lid start byte
Reply
#4
which version of python are you running, It works for me using python 3.6.4
from urllib.request import urlopen
from bs4 import BeautifulSoup

websitecode = urlopen("https://www.google.com").read()
soup=BeautifulSoup(websitecode, "html.parser")
links=soup.findAll("a")
print(links)
results:
Output:
[<a class="gb1" href="https://www.google.com/imghp?hl=en&amp;tab=wi">Images</a>, <a class="gb1" href="https://maps.google.com/maps?hl=en&amp;tab=wl">Maps</a>, <a class="gb1" href="https://play.google.com/?hl=en&amp;tab=w8">Play</a>, <a class="gb1" href="https://www.youtube.com/?gl=US&amp;tab=w1">YouTube</a>, <a class="gb1" href="https://news.google.com/nwshp?hl=en&amp;tab=wn">News</a>, <a class="gb1" href="https://mail.google.com/mail/?tab=wm">Gmail</a>, <a class="gb1" href="https://drive.google.com/?tab=wo">Drive</a>, <a class="gb1" href="https://www.google.com/intl/en/options/" style="text-decoration:none"><u>More</u> »</a>, <a class="gb4" href="http://www.google.com/history/optout?hl=en">Web History</a>, <a class="gb4" href="/preferences?hl=en">Settings</a>, <a class="gb4" href="https://accounts.google.com/ServiceLogin?hl=en&amp;passive=true&amp;continue=https://www.google.com/" id="gb_70" target="_top">Sign in</a>, <a href="/search?site=&amp;ie=UTF-8&amp;q=Winter+Solstice+2017&amp;oi=ddle&amp;ct=winter-solstice-2017-northern-hemisphere-5368746536337408-law&amp;hl=en&amp;kgmid=/m/02qjfw4&amp;sa=X&amp;ved=0ahUKEwi697Wny53YAhWMS98KHamFAOsQPQgD"><img alt="Winter Solstice 2017" border="0" height="211" id="hplogo" onload="window.lol&amp;&amp;lol()" src="/logos/doodles/2017/winter-solstice-2017-northern-hemisphere-5368746536337408-law.gif" title="Winter Solstice 2017" width="500"/><br/></a>, <a href="/advanced_search?hl=en&amp;authuser=0">Advanced search</a>, <a href="/language_tools?hl=en&amp;authuser=0">Language tools</a>, <a href="/intl/en/ads/">Advertising Programs</a>, <a href="/services/">Business Solutions</a>, <a href="https://plus.google.com/116899029375914044550" rel="publisher">+Google</a>, <a href="/intl/en/about.html">About Google</a>, <a href="/intl/en/policies/privacy/">Privacy</a>, <a href="/intl/en/policies/terms/">Terms</a>]
Reply
#5
@Larz60+ 3.4.3. It's a 3.4.3 version issue it seems. I appreciate your cooperation.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soup - access a rating value in a class KatMac 1 2,612 Apr-16-2021, 01:27 PM
Last Post: snippsat
  *Beginner* web scraping/Beautiful Soup help 7ken8 2 1,828 Jan-28-2021, 04:26 PM
Last Post: 7ken8
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 1,905 Oct-01-2020, 02:19 PM
Last Post: snippsat
  error in code web scraping alexisbrunaux 5 2,389 Aug-19-2020, 02:31 AM
Last Post: alexisbrunaux
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 7,568 Jul-11-2020, 04:31 PM
Last Post: j.crater
  error zomato scraping data syxzetenz 3 2,521 Jun-23-2020, 08:53 PM
Last Post: Gribouillis
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,101 Jun-23-2020, 02:53 PM
Last Post: robin73
  Web scraping error jithin123 0 1,681 Mar-22-2020, 08:13 PM
Last Post: jithin123
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 1,785 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 2,738 Mar-09-2020, 06:04 PM
Last Post: jonesjoz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020