Posts: 4,646
Threads: 1,493
Joined: Sep 2016
Sep-18-2020, 02:24 AM
(This post was last modified: Sep-18-2020, 02:25 AM by Skaperen.)
given "random" bytes in the range from 0 to 255, such as from a file containing machine executable code, i would like to convert this "byte sequence" to str type without applying any UTF-8, or any other, interpretation to it. the following code can do this:
# convert bytes in b to str in s without any encoding
s = ''.join(chr(x)for x in b)
what codec would do the same thing in decode or the exact reverse in encode (when every str character has an ord() value of 255 or lower)?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
i have functions to convert both ways. the reverse does use ord(). but i want to do this as a codec, now.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
Sep-18-2020, 10:35 PM
(This post was last modified: Sep-18-2020, 10:41 PM by Skaperen.)
what i want to have in bytes is initially in strings. i want to have it in bytes because the file i get for a pipe is open in binary mode. the content in the string is already encoded in UTF-8 (is not normal Unicode) which then needs to remain unchanged. i'm also doing some structure formatting in binary such as a byte count length put in front. it needs to already be in UTF-8 to get the length right, so conversion to Unicode to work in str makes no sense.
i already have coding patterns of my own to do this conversion. as part of some code cleanup i am trying to make some things be more like what other coders expect, if there are no implementation penalties. i suspect that conversions to a list of ints then to another sequence type is more of a penalty, anyway.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
Sep-25-2020, 02:20 AM
(This post was last modified: Sep-25-2020, 02:21 AM by Skaperen.)
i'm working with content that is already UTF-8 encoded, i got in bytes. i'm changing the type to str in a transparent way so that UTF-8 encoding remains unchanged. then i work with it in type str. then i can change it back without any more encoding if i ultimately need it as binary to output it (such as a file open in binary mode). if i am the only one writing the code to work with the content, i'll leave it in type bytes. but often i need to call some other code that expects str.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.