Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 bs4 : output html content into a txt file
#1
Self-learning python. The following code returns UnicodeEncodeError. How should I fix it? Thanks.

import bs4, requests
#----------------------------------------------------------------------------
URL = "https://learnxinyminutes.com/docs/r"
#----------------------------------------------------------------------------
soup = bs4.BeautifulSoup(requests.get(URL).text, "lxml")
with open( r"C:\Users\User\Desktop\Test.txt" ,"w") as oFile:
    oFile.write(str(soup.html))
    oFile.close()
UnicodeEncodeError: 'cp950' codec can't encode character '\xf8' in position 20242: illegal multibyte sequence
Quote
#2
You are using python2, so change line#7 to
oFile.write(str(soup.html.encode('utf8'))
Even better would be to use python3 (given that you start learning python now), as support for python2 would end soon
Quote
#3
Like this,use content when read in.
Set utf-8 in open and no str convert with use of prettify().
This is a Python 3 solution which we gone be more strict to advice in 2018.
So i will not post a Python 2 solution for this Snooty
import requests
from bs4 import BeautifulSoup

url = "https://learnxinyminutes.com/docs/r"
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')

with open('url.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(soup.prettify())
buran likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 79 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Sending file html ? JohnnyCoffee 3 349 Sep-06-2019, 04:32 PM
Last Post: snippsat
  How do I get rid of the HTML tags in my output? glittergirl 1 1,994 Aug-05-2019, 08:30 PM
Last Post: snippsat
  Reading a html file peterl 4 1,071 Aug-20-2018, 03:16 PM
Last Post: peterl
  Problem parsing website html file thefpgarace 2 1,027 May-01-2018, 11:09 AM
Last Post: Standard_user
  How to clean html content using BeautifulSoup in Python 3.6? PrateekG 5 3,787 Apr-27-2018, 01:14 PM
Last Post: snippsat
  How to print particular text areas fron an HTML file (not site) Chris 10 2,625 Dec-11-2017, 09:20 AM
Last Post: j.crater
  show csv file in flask template.html rr28rizal 5 19,098 Nov-12-2017, 01:53 PM
Last Post: rr28rizal
  read text file using python and display its output to html using django amit 0 12,943 Jul-23-2017, 06:14 AM
Last Post: amit

Forum Jump:


Users browsing this thread: 1 Guest(s)