Python Forum
testutf.py fails py3, sorta works in py2
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
testutf.py fails py3, sorta works in py2
#1
i made a little test program to start exploring utf-8, testutf.py:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import division, print_function, unicode_literals
word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e'
rword = repr(word)
print( rword )
running it i get:
Output:
lt1/forums /home/forums 19> py2 testutf.py u'Asunci\xc3\xb3n' lt1/forums /home/forums 20> py3 testutf.py Traceback (most recent call last):   File "testutf.py", line 6, in <module>     print( rword ) UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128) lt1/forums /home/forums 21> cat testutf.py #!/usr/bin/env python3 # -*- coding: utf-8 -*- from __future__ import division, print_function, unicode_literals word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e' rword = repr(word) print( rword ) lt1/forums /home/forums 22> ls -l testutf.py -rw-r--r-- 1 forums forums 193 Jan 22 23:40 testutf.py lt1/forums /home/forums 23> md5sum testutf.py 27e2a0797beaaf93daf84c8eb2323d02  testutf.py lt1/forums /home/forums 24>
if i add on .encoding() then i get this:
Output:
lt1/forums /home/forums 26> md5sum testutf.py ac5ca2295d64a26ff249a4afad161a79  testutf.py lt1/forums /home/forums 27> ls -l testutf.py -rw-r--r-- 1 forums forums 202 Jan 22 23:54 testutf.py lt1/forums /home/forums 28> cat testutf.py #!/usr/bin/env python3 # -*- coding: utf-8 -*- from __future__ import division, print_function, unicode_literals word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e' rword = repr(word) print( rword.encode() ) lt1/forums /home/forums 29> py3 testutf.py b"'Asunci\xc3\x83\xc2\xb3n'" lt1/forums /home/forums 30> py2 testutf.py u'Asunci\xc3\xb3n' lt1/forums /home/forums 31>
now what do i need to do to output utf-8 characters to my terminal without the b'' or u'' additions?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
let's try os.write(), with .encode() since we already know os.write() wants bytes, and focus on python 3:

Output:
lt1/forums /home/forums 52> cat testutf.py #!/usr/bin/env python3 # -*- coding: utf-8 -*- from __future__ import division, print_function, unicode_literals import os word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e' os.write( 1, word.encode() ) lt1/forums /home/forums 53> py3 testutf.py;echo Asunción lt1/forums /home/forums 54> py3 testutf.py|od -Ad -tx1 -w16 0000000 41 73 75 6e 63 69 c3 83 c2 b3 6e 0000011 lt1/forums /home/forums 55>
now something changed the c3b3 to c383c2b3.  wtf is that for?  that fundamentally breaks os.write() unless .encode() is doing it.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#3
Remove lines that make no sense for python 3.
So this code:
My test,python 3.6.
#!/usr/bin/env python3
# testutf.py
word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e'
rword = repr(word)
print(rword)
print(type(rword))
Output:
λ python testutf.py 'Asunción' <class 'str'>
Test on repl.it,and you will get the same result.
So the problem is on your side.
Don't know encoding or this strange mix,
can you just test with real Unicode?
Reply
#4
(Jan-23-2017, 07:23 AM)snippsat Wrote: Remove lines that make no sense for python 3.
So this code:
My test,python 3.6.
#!/usr/bin/env python3
# testutf.py
word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e'
rword = repr(word)
print(rword)
print(type(rword))
Output:
λ python testutf.py 'Asunción' <class 'str'>
Test on repl.it,and you will get the same result.
So the problem is on your side.
Don't know encoding or this strange mix,
can you just test with real Unicode?

this is real unicode (from that file).  isn't UTF-8 the default in Python 3?  it failed on repl.it for me.  the output you show is wrong, too.

i have narrowed down the culprit as str.encode()  i posted it on stackoverflow.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
Do your terminal application support UTF-8? There is several out there which even today doesn't do it
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
Dos your file really contain this \41\....?
It's latin-1.
>>> word = '\x41\x73\x75\x6e\x63\x69\xc3\xb3\x6e'
>>> word.encode('latin-1').decode('utf-8')
'Asunción'
Unicode as input,no encoding.
#!/usr/bin/env python3
# testutf.py
word = 'ó ♟♜♞☻☺'
rword = repr(word)
print(rword)
print(type(rword))
λ python testutf.py
'ó ♟♜♞☻☺'
<class 'str'>
Reply
#7
'\x41' == 'A' (65) the first 128 code points of Unicode are Ascii and are one byte encodings in UTF-8.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020