Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with character sets
#1
Yesterday I asked about getting data from a webpage, got some good advice and had a little success. However, there is a problem with the character sets.

If I look at the source code of the starting webpage, it has:

Quote:<meta http-equiv=Content-Type content=text/html;charset=gb2312>

From the source code view, I clicked my way through the links till I found what I wanted. I presume all subordinate webpages are then also GB2312

I got the first set of data, about 466 lines, with:

Quote:line = soup.find('table').text

When I saved the data I retrieved as a text file, what should be Chinese characters, which I can see in Firefox, end up looking like hieroglyphics in the text file I save (small sample here):

Quote:רҵ
ÆÚÊý
ÐÕÃû
ÐÔ±ð
ÊÖ»úºÅÂë
Éí·ÝÖ¤ºÅ
µÇ½ÃÜÂë
ѧºÅ
²é¿´
ÐÞ¸Ä
ɾ³ý

Numbers display correctly.

How can I:
A. convert line directly to UTF-8 or
B. tell Python to write this data to a text file encoded GB2312?

I tried Linux command line iconv on the text file, but just get errors, same as with utf8trans.

Thanks for any tips!
Reply


Messages In This Thread
Problem with character sets - by Pedroski55 - Mar-03-2019, 12:31 AM
RE: Problem with character sets - by Pedroski55 - Mar-03-2019, 11:54 PM
RE: Problem with character sets - by snippsat - Mar-04-2019, 12:25 AM
RE: Problem with character sets - by Pedroski55 - Mar-04-2019, 02:09 AM
RE: Problem with character sets - by snippsat - Mar-04-2019, 02:35 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Peculiar pattern from printing of sets SahandJ 7 1,681 Dec-29-2021, 06:31 PM
Last Post: bowlofred
  How does one combine 2 data sets ? detlefschmitt 2 1,701 Sep-03-2021, 03:38 AM
Last Post: detlefschmitt
  [solved] unexpected character after line continuation character paul18fr 4 3,423 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  Looping Through Large Data Sets JoeDainton123 10 4,400 Oct-18-2020, 02:58 PM
Last Post: buran
  comprehension for sets Skaperen 2 1,872 Aug-07-2020, 10:12 PM
Last Post: Skaperen
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,192 Jul-13-2020, 07:05 PM
Last Post: snippsat
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,759 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  Sort sets by item values Sergey 4 70,793 Apr-19-2019, 10:50 AM
Last Post: Sergey
  Replace changing string including uppercase character with lowercase character silfer 11 6,221 Mar-25-2019, 12:54 PM
Last Post: silfer
  merge 3 sql data sets to 1 librairy brecht83 0 2,120 Sep-26-2018, 10:13 PM
Last Post: brecht83

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020