Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
when repr() fails
#1
what function can convert a string containing unprintable binary characters to source code compatible escape sequences that can restore that binary character when that escape sequence gets parsed as part of a source code string literal?

i thought it to be repr() but it isn't since it does not convert some character values to the escape sequence.


Output:
Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> ord('\377') 255 >>> chr(255) '\xff' >>> repr(chr(255)) "'\xff'" >>> print(repr(chr(255))) Traceback (most recent call last):   File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: ordinal not in range(128) >>> a=repr(chr(255)) >>> a "'\xff'" >>> a[0] "'" >>> a[1] '\xff' >>> a[2] "'" >>> repr(chr(255))[1] '\xff' >>>
in the above, it can be seen that the character with the binary value of 255 gets "converted" to a single character with the value of 255.  it is the python interactive tool showing us the \xff to represent that value.  i want a function that converts that character (value 255) to \xff which works in source code:

Output:
Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> len('\xff') 1 >>> ord('\xff') 255 >>>
this shows how a script can code eiither \xff or \377 to get a single character with the value 255.  a function that can convert to either hexadecimal form or octal form is suitable.


Output:
lt1/forums /home/forums 13> cat ff.py c = '\xff' print(len(c)) print(ord(c)) c = '\377' print(len(c)) print(ord(c)) lt1/forums /home/forums 14> py3 ff.py 1 255 1 255 lt1/forums /home/forums 15>
then there is reprlib.repr().  but it has exactly the same issues.  before i go code up my own version of repr() can someone point me to the correct function somewhere within python?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
I have mentioned it in the https://python-forum.io/Thread-string-to...4#pid16354
Maybe is what you need.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
You may try to add
export LANG=en_US.UTF-8
to you .bashrc - or whatever serves the same purpose on your system
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#4
(May-01-2017, 07:29 AM)wavic Wrote: I have mentioned it in the https://python-forum.io/Thread-string-to...4#pid16354
Maybe is what you need.

sorry, i don't see it.  the firefox find operation does not find "repr" anywhere but in the source post in post #8.2 (code that can fail in some cases due this issue of repr() which i have since discovered is specific to python3).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
The "unicode_escape" codec is your friend:
Output:
>>> x=u'Dèjà-vu\n\t' >>> codecs.encode(x,'unicode_escape') 'D\\xe8j\\xe0-vu\\n\\t' >>> print codecs.encode(x,'unicode_escape') D\xe8j\xe0-vu\n\t
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#6
Star 
@Skaperen
>>> print(repr(chr(255)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: ordinal not in range(128)
>>> a=repr(chr(255))
Your shell still has ascii as encoding.
You have posted different error about this before.

From a clean Mint 18.1 install.
mint@mint ~ $ echo $LANG
en_US.UTF-8
mint@mint ~ $ python3 -c"import sys; print(sys.stdout.encoding)"
UTF-8
mint@mint ~ $ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> print(repr(chr(255)))
'ÿ'

>>> # To make your error i have to encode to ascii
>>> a = repr(chr(255))  
>>> a.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: ordinal not in range(128)
Reply
#7
(May-01-2017, 04:37 PM)snippsat Wrote: Your shell still has ascii as encoding.
You have posted different error about this before.
then i guess i didn't recognize them as the same

i did find the ascii() builtin function does what i wanted.  but ascii() does not exist in python2.  in python2 repr() does what i want, and in python3 repr() does not, while ascii() does. so i have to assume this be related, somehow, to the string concept changes between python2 and python3.  and, i guess i will have code that tests the python version (a module intended to work in both python2 and python3), maybe like this code that also removes the quotes:

foo = (repr(bar) if sys.version_info.major < 3 else ascii(bar))[1:-1]
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#8
What i mean that you should try to fix your Terminal/shell encoding.
print(repr(chr(255))) shall not give error in a Python 3 shell.
Go to repl.it and just past in print function over,
you see that it do not make a UnicodeEncodeError: 'ascii' as you get.

Try a better REPL,i only use the default one for testing like this.
Eg ptpython,BPython,IPython.
Reply
#9
(May-02-2017, 02:32 PM)snippsat Wrote: What i mean that you should try to fix your Terminal/shell encoding.
print(repr(chr(255))) shall not give error in a Python 3 shell.
Go to repl.it and just past in print function over,
you see that it do not make a UnicodeEncodeError: 'ascii' as you get.

Try a better REPL,i only use the default one for testing like this.
Eg ptpython,BPython,IPython.

it appears i can accomplish my goal by testing the version and using ascii() in version 3 or higher.  perhaps i can do:

if sys.version_info.major < 3:
    ascii = repr
then just use ascii() in all cases.

my goal is to convert any string (and Unicode in Python2) character value to the source escape sequence made of ascii characters that represents that character value.  this is to be the escape sequence that works in Python source code string literals.  i believe i am now accomplishing that.  is there any reason to believe i am not?

i am revising my print_object() function.  this needs to work right in all terminal/shell settings.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Repr() function Valentina 3 3,356 Aug-22-2019, 11:28 AM
Last Post: perfringo
  ascii() (repr() in py2) Skaperen 1 3,899 Jun-05-2017, 10:32 PM
Last Post: Ofnuts
  str vs repr Skaperen 9 5,941 Jun-05-2017, 01:04 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020