Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with character sets
#3
Pedroski55 Wrote:When I saved the data I retrieved as a text file, what should be Chinese characters,
Have to careful to keep Unicode use utf-8,when take text out of Python 3.
Example Requests and BeautifulSoup will keep correct encoding from a web-site.
from bs4 import BeautifulSoup
import requests

url = 'http://www.sohu.com'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
text = soup.select('div.news > p:nth-child(1) > a')
Test:
>>> text
[<a data-param="&amp;_f=index_cpc_0" href="http://www.sohu.com/a/298818150_428290?g=0?code=36b1c5f548e7c32034c382e96f3e401" target="_blank" title="全国政协十三届二次会议在京开幕">全国政协十三届二次会议在京开幕</a>]

>>> text[0].attrs['title']
'全国政协十三届二次会议在京开幕'
Saving to disk i do not need to use gb2312,always utf-8 when Unicode show correct in Python 3.
Unicode improvement was one biggest change moving from Python 2 to 3.
# Write to disk
ch = '全国政协十三届二次会议在京开幕'
with open('ch.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(ch)

# Read from disk
with open('ch.txt', encoding='utf-8') as f:
    print(f.read())
In and out still correct:
Output:
全国政协十三届二次会议在京开幕
Reply


Messages In This Thread
Problem with character sets - by Pedroski55 - Mar-03-2019, 12:31 AM
RE: Problem with character sets - by Pedroski55 - Mar-03-2019, 11:54 PM
RE: Problem with character sets - by snippsat - Mar-04-2019, 12:25 AM
RE: Problem with character sets - by Pedroski55 - Mar-04-2019, 02:09 AM
RE: Problem with character sets - by snippsat - Mar-04-2019, 02:35 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Peculiar pattern from printing of sets SahandJ 7 1,697 Dec-29-2021, 06:31 PM
Last Post: bowlofred
  How does one combine 2 data sets ? detlefschmitt 2 1,710 Sep-03-2021, 03:38 AM
Last Post: detlefschmitt
  [solved] unexpected character after line continuation character paul18fr 4 3,456 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  Looping Through Large Data Sets JoeDainton123 10 4,427 Oct-18-2020, 02:58 PM
Last Post: buran
  comprehension for sets Skaperen 2 1,876 Aug-07-2020, 10:12 PM
Last Post: Skaperen
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,212 Jul-13-2020, 07:05 PM
Last Post: snippsat
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,778 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  Sort sets by item values Sergey 4 73,961 Apr-19-2019, 10:50 AM
Last Post: Sergey
  Replace changing string including uppercase character with lowercase character silfer 11 6,257 Mar-25-2019, 12:54 PM
Last Post: silfer
  merge 3 sql data sets to 1 librairy brecht83 0 2,125 Sep-26-2018, 10:13 PM
Last Post: brecht83

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020