Python Forum
how to read chinese character?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to read chinese character?
#1
Hi, I try to translate a Chinese page to an English page,
but the result is gibberish,
how to "convert" It to Chinese?
# importing the modules
import requests
from bs4 import BeautifulSoup
 
# target url
url = "https://www.boshisw.com/boshi/14_14309/"
 
# making requests instance
reqs = requests.get(url)
 
# using the BeautifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')
 
# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    print(title.get_text())
Error:
Title of the website is : ÎÒÔÚԭʼÉç»áµ±´å³¤×îÐÂÕ½ÚÁбí_ÎÒÔÚԭʼÉç»áµ±´å³¤×îÐÂÕ½ÚĿ¼_²©ÊËÊéÎÝ
thank you for reading, have a nice day
Reply
#2
Change to reqs.content.
This mean that Bs4 is given bytes and it will deal with Unicode,using reqs.text it can be mix up between Requests and Bs4.
Encodings
Quote:Any HTML or XML document is written in a specific encoding like ASCII or UTF-8.
But when you load that document into Beautiful Soup, you’ll discover it’s been converted to Unicode:
Unicode, Dammit guesses correctly most of the time.
# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = "https://www.boshisw.com/boshi/14_14309/"

# making requests instance
reqs = requests.get(url)

# using the BeautifulSoup module
soup = BeautifulSoup(reqs.content, 'html.parser')
print(type(soup))

# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    print(title.get_text())
Output:
我在原始社会当村长最新章节列表_我在原始社会当村长最新章节目录_博仕书屋
kucingkembar and Gribouillis like this post
Reply
#3
(Aug-25-2022, 07:47 PM)snippsat Wrote: Change to reqs.content.
This mean that Bs4 is given bytes and it will deal with Unicode,using reqs.text it can be mix up between Requests and Bs4.
Encodings
Quote:Any HTML or XML document is written in a specific encoding like ASCII or UTF-8.
But when you load that document into Beautiful Soup, you’ll discover it’s been converted to Unicode:
Unicode, Dammit guesses correctly most of the time.
# importing the modules
import requests
from bs4 import BeautifulSoup

# target url
url = "https://www.boshisw.com/boshi/14_14309/"

# making requests instance
reqs = requests.get(url)

# using the BeautifulSoup module
soup = BeautifulSoup(reqs.content, 'html.parser')
print(type(soup))

# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    print(title.get_text())
Output:
我在原始社会当村长最新章节列表_我在原始社会当村长最新章节目录_博仕书屋

thank you, i looking this for hours,
i give you reputation point
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [solved] unexpected character after line continuation character paul18fr 4 3,411 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,183 Jul-13-2020, 07:05 PM
Last Post: snippsat
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,745 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  upload Files to FTP and file name is Chinese mollychen 2 2,603 Apr-15-2019, 01:01 AM
Last Post: mollychen
  Replace changing string including uppercase character with lowercase character silfer 11 6,203 Mar-25-2019, 12:54 PM
Last Post: silfer
  SyntaxError: unexpected character after line continuation character Saka 2 18,572 Sep-26-2017, 09:34 AM
Last Post: Saka

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020