![]() |
bs4 : output html content into a txt file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: bs4 : output html content into a txt file (/thread-7277.html) |
bs4 : output html content into a txt file - smallabc - Jan-02-2018 Self-learning python. The following code returns UnicodeEncodeError. How should I fix it? Thanks. import bs4, requests #---------------------------------------------------------------------------- URL = "https://learnxinyminutes.com/docs/r" #---------------------------------------------------------------------------- soup = bs4.BeautifulSoup(requests.get(URL).text, "lxml") with open( r"C:\Users\User\Desktop\Test.txt" ,"w") as oFile: oFile.write(str(soup.html)) oFile.close()UnicodeEncodeError: 'cp950' codec can't encode character '\xf8' in position 20242: illegal multibyte sequence RE: bs4 : output html content into a txt file - buran - Jan-02-2018 You are using python2, so change line#7 to oFile.write(str(soup.html.encode('utf8'))Even better would be to use python3 (given that you start learning python now), as support for python2 would end soon RE: bs4 : output html content into a txt file - snippsat - Jan-02-2018 Like this,use content when read in.Set utf-8 in open and no str convert with use of prettify() .This is a Python 3 solution which we gone be more strict to advice in 2018. So i will not post a Python 2 solution for this ![]() import requests from bs4 import BeautifulSoup url = "https://learnxinyminutes.com/docs/r" url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') with open('url.txt', 'w', encoding='utf-8') as f_out: f_out.write(soup.prettify()) |