Python Forum

Full Version: bs4 : output html content into a txt file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Self-learning python. The following code returns UnicodeEncodeError. How should I fix it? Thanks.

import bs4, requests
#----------------------------------------------------------------------------
URL = "https://learnxinyminutes.com/docs/r"
#----------------------------------------------------------------------------
soup = bs4.BeautifulSoup(requests.get(URL).text, "lxml")
with open( r"C:\Users\User\Desktop\Test.txt" ,"w") as oFile:
    oFile.write(str(soup.html))
    oFile.close()
UnicodeEncodeError: 'cp950' codec can't encode character '\xf8' in position 20242: illegal multibyte sequence
You are using python2, so change line#7 to
oFile.write(str(soup.html.encode('utf8'))
Even better would be to use python3 (given that you start learning python now), as support for python2 would end soon
Like this,use content when read in.
Set utf-8 in open and no str convert with use of prettify().
This is a Python 3 solution which we gone be more strict to advice in 2018.
So i will not post a Python 2 solution for this Snooty
import requests
from bs4 import BeautifulSoup

url = "https://learnxinyminutes.com/docs/r"
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')

with open('url.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(soup.prettify())