Python Forum
encoding control characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
encoding control characters
#1
has anyone seen code to encode control characters? here is my code to decode control characters. encode is Carriage Return -> '\r', decode is '\r' -> Carriage Return.
from __future__ import division, generators, print_function, with_statement
#-------#-------#-------#-------#-------#-------#-------#-------#-------#-------#
# -*- coding: utf-8 -*-
# file          decodecontrol.pyf
# purpose       function to decode control characters
# email         10054452614123394844460370234029112340408691
# function      decode_control
#
# purpose       decode character sequences that encode other characters like
#               control characters that are not readily entered in a plain
#               character string.
#
# 1 argument    string with encoded control characters - ascii
#
# returns       tuple (unprocessed count,decoded result string,end char)
#
# codes               ^a or \001 or \x01 = {1} Ctrl-A
#                     ^b or \002 or \x02 = {2} Ctrl-B
#               \c or ^c or \003 or \x03 = {3} Ctrl-C
#                     ^d or \004 or \x04 = {4} Ctrl-D
#                     ^e or \005 or \x05 = {5} Ctrl-E
#                     ^f or \006 or \x06 = {6} Ctrl-F
#               \a or ^g or \007 or \x07 = {7} Alarm
#               \b or ^h or \010 or \x08 = {8} Backspace
#               \t or ^i or \011 or \x09 = {9} Tab
#               \n or ^j or \012 or \x0a = {10} Newline
#               \v or ^k or \013 or \x0b = {11} Vertical Tab
#               \f or ^l or \014 or \x0c = {12} Formfeed
#               \r or ^m or \015 or \x0d = {13} Carriage Return
#               \e or ^[ or \033 or \x1b = {27} Escape
#               \\       or \134 or \x5c = {92} literal backslash
#               \^       or \136 or \x5e = {94} literal carat
#                     ^@ or \0   or \x00 = {0} raw binary zero
#               \[0-3][0-7][0-7]         converts octal to raw
#               \x[0-9a-fA-F][0-9a-fA-F] converts cetal to raw
#               \o[0-7][0-7][0-7]        converts octal to raw
#               \d[0-2][0-9][0-9]        converts decimal to raw
#
# note          only 8 bits is used for bytes encoded with the \d sequence
#               for any value given in the 3 digits it interprets.
#-------#-------#-------#-------#-------#-------#-------#-------#-------#-------#
def decode_control(istr):
    code = {'a':7,'b':8,'c':3,'t':9,'n':10,'v':11,'f':12,'r':13,'z':26,'e':27,'\\':92,'^':94}
    base = {'d':(3,10),'o':(3,8),'x':(2,16)}
    l = len(istr)
    ostr = ''
    while istr:
        c = istr[0]
        istr = istr[1:]
        l = len(istr)
        if c == '^':
            if l < 1:
                return l,ostr,c
            c,istr,l = chr(ord(istr[0])%32),istr[1:],l-1
        elif c == '\\':
            if l < 1:
                return l,ostr,c
            c,istr,l = istr[0].lower(),istr[1:],l-1
            if c in code:
                c = chr(code[c])
            elif c in ('0','1','2','3'):
                c = None
                try:
                    for x in range(3,6):
                        c = int(istr[2:x],8)
                except ValueError:
                    if c is None:
                        return l,ostr,c
            elif c in base:
                if l < 2:
                    return l,ostr,c
                d,b = base[c]
                try:
                    digits = istr[1:2+d]
                    c = int(istr[1:2+d],b) & 255
                except ValueError:
                    return l,ostr,c
            else:
                return l,ostr,c
        ostr += c
    return l,ostr,None
# EOF
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Since the first 127 characters of ascii and unicode are the same, could you not simply use the unicode, i.e.: print('\u0007') (for Alert Ctl-G)?
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#3
(Jan-25-2018, 02:30 PM)sparkz_alot Wrote: Since the first 127 characters of ascii and unicode are the same, could you not simply use the unicode, i.e.: print('\u0007') (for Alert Ctl-G)?

or even print('\U00000007')

i guess i should add all these unicode sequences to both of my decode and encode functions. what i was looking for is an already existing implementation to see how they handle the case where many different sequences can be decoded to the same control character. the decode function is a non-issue since it can simply support every possible sequence, such as '\r' or '^m' or '\015' or '\d013' or '\o015' or '\x0d' or 'X0D' for carriage return decoding. but for encoding, which produces the sequence, which, of generally many possibilities, is to be produced? i am wanting to get an idea how people expect to handle this. i've already started writing code. for now my encode_control() function has a named option method= to select a method, if the default is not desired.

Python's repr() and ascii() functions are limited implementations that give this kind of output. for carriage return it always returns '\r' and has no means to choose an alternative. they are "limited" as in not supporting '\a' and '\b' and '\v' and '\f' even though literal string parsing does.

try repr('\a\b\t\n\v\f\r') and see that.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
I guess I'm not understanding what it is you're trying to do. But then I haven't finished my first cup of coffee.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#5
i'm always trying to do big complicated things that seem to push any coding language to the edge.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
(Jan-27-2018, 04:03 AM)Skaperen Wrote: i'm always trying to do big complicated things that seem to push any coding language to the edge.

That's fine, but what I'm saying is I'm not sure what your trying to do in this case.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#7
i am looking for code that ENcodes control characters to forms that represent them, such as backslash ( \ ) followed by letter r to represent carriage return or backslash ( \ ) followed by letter t to represent a tab. what i want to see is how such code might handle giving the caller a choice in other ways to represent control characters. since a function to ENcodes control characters is producing the representation form, either it makes the choice or provides a means for the caller to make that choice. an example of another way to represent the above examples is carat ( ^ ) followed by the letter m to represent carriage return or carat ( ^ ) followed by the letter i to represent a tab.

here is a copy of my code (still buggy but mostly works) that DEcodes the representation to the control character form. note that it handles both forms exampled above. starting at line 17 you can see a table of supported codes.
from __future__ import division, generators, print_function, with_statement
#-------#-------#-------#-------#-------#-------#-------#-------#-------#-------#
# -*- coding: utf-8 -*-
# file          decodecontrol.pyf
# purpose       function to decode control characters
# email         10054452614123394844460370234029112340408691
# function      decode_control
#
# purpose       decode character sequences that encode other characters like
#               control characters that are not readily entered in a plain
#               character string.
#
# 1 argument    string with encoded control characters - ascii
#
# returns       tuple (unprocessed count,decoded result string,end char)
#
# codes               ^a or \001 or \x01 =  {1} Ctrl-A
#                     ^b or \002 or \x02 =  {2} Ctrl-B
#               \c or ^c or \003 or \x03 =  {3} Ctrl-C
#                     ^d or \004 or \x04 =  {4} Ctrl-D
#                     ^e or \005 or \x05 =  {5} Ctrl-E
#                     ^f or \006 or \x06 =  {6} Ctrl-F
#               \a or ^g or \007 or \x07 =  {7} Alarm
#               \b or ^h or \010 or \x08 =  {8} Backspace
#               \t or ^i or \011 or \x09 =  {9} Tab
#               \n or ^j or \012 or \x0a = {10} Newline
#               \v or ^k or \013 or \x0b = {11} Vertical Tab
#               \f or ^l or \014 or \x0c = {12} Formfeed
#               \r or ^m or \015 or \x0d = {13} Carriage Return
#               \e or ^[ or \033 or \x1b = {27} Escape
#               \\       or \134 or \x5c = {92} literal backslash
#               \^       or \136 or \x5e = {94} literal carat
#                     ^@ or \0   or \x00 = {0} raw binary zero
#               \[0-3][0-7][0-7]         converts octal to raw
#               \x[0-9a-fA-F][0-9a-fA-F] converts cetal to raw
#               \o[0-7][0-7][0-7]        converts octal to raw
#               \d[0-2][0-9][0-9]      converts decimal to raw
#
# note          only 8 bits is used for bytes encoded with the \d sequence
#               for any value given in the 3 digits it interprets.
#-------#-------#-------#-------#-------#-------#-------#-------#-------#-------#
def decode_control(istr):
    code = {'a':7,'b':8,'c':3,'t':9,'n':10,'v':11,'f':12,'r':13,'z':26,'e':27,'\\':92,'^':94}
    base = {'d':(3,10),'o':(3,8),'x':(2,16)}
    l = len(istr)
    ostr = ''
    while istr:
        c = istr[0]
        istr = istr[1:]
        l = len(istr)
        if c == '^':
            if l < 1:
                return l,ostr,c
            c,istr,l = chr(ord(istr[0])%32),istr[1:],l-1
        elif c == '\\':
            if l < 1:
                return l,ostr,c
            c,istr,l = istr[0].lower(),istr[1:],l-1
            if c in code:
                c = chr(code[c])
            elif c in ('0','1','2','3'):
                c = None
                try:
                    for x in range(3,6):
                        c = int(istr[2:x],8)
                except ValueError:
                    if c is None:
                        return l,ostr,c
            elif c in base:
                if l < 2:
                    return l,ostr,c
                d,b = base[c]
                try:
                    digits = istr[1:2+d]
                    c = int(istr[1:2+d],b) & 255
                except ValueError:
                    return l,ostr,c
            else:
                return l,ostr,c
        ostr += c
    return l,ostr,None
# EOF
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020