Python Forum
bs4 : output html content into a txt file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
bs4 : output html content into a txt file
#1
Self-learning python. The following code returns UnicodeEncodeError. How should I fix it? Thanks.

import bs4, requests
#----------------------------------------------------------------------------
URL = "https://learnxinyminutes.com/docs/r"
#----------------------------------------------------------------------------
soup = bs4.BeautifulSoup(requests.get(URL).text, "lxml")
with open( r"C:\Users\User\Desktop\Test.txt" ,"w") as oFile:
    oFile.write(str(soup.html))
    oFile.close()
UnicodeEncodeError: 'cp950' codec can't encode character '\xf8' in position 20242: illegal multibyte sequence
Reply
#2
You are using python2, so change line#7 to
oFile.write(str(soup.html.encode('utf8'))
Even better would be to use python3 (given that you start learning python now), as support for python2 would end soon
Reply
#3
Like this,use content when read in.
Set utf-8 in open and no str convert with use of prettify().
This is a Python 3 solution which we gone be more strict to advice in 2018.
So i will not post a Python 2 solution for this Snooty
import requests
from bs4 import BeautifulSoup

url = "https://learnxinyminutes.com/docs/r"
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')

with open('url.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(soup.prettify())
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,821 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Web Scraping can not getting all HTML content yqqwe123 0 1,617 Aug-02-2021, 08:56 AM
Last Post: yqqwe123
  show csv file in flask template.html rr28rizal 8 34,529 Apr-12-2021, 09:24 AM
Last Post: adamabusamra
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,536 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Scrape for html based on url string and output into csv dana 13 5,359 Jan-13-2021, 03:52 PM
Last Post: snippsat
  Open and read a tab delimited file from html using python cgi luffy 2 2,633 Aug-24-2020, 06:25 AM
Last Post: luffy
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,329 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Sending file html ? JohnnyCoffee 3 57,548 Sep-06-2019, 04:32 PM
Last Post: snippsat
  How do I get rid of the HTML tags in my output? glittergirl 1 3,692 Aug-05-2019, 08:30 PM
Last Post: snippsat
  Reading a html file peterl 4 4,494 Aug-20-2018, 03:16 PM
Last Post: peterl

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020