Python Forum
documentation for raw bytes
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
documentation for raw bytes
#1
i have a few programs to write that i could have written in C and would have had no problems.  but i started writing them in Python3 and am running into the code conversion issues because a few of the bytes are above 127 in value.  and they are not all valid UTF-8.  they do not represent Unicode characters.  they are raw bytes.  they are what they are.  but Python3 gets upset over them.  i have been trying to read the documentation on this, but it is confusing.  i need to use bytes.  but many parts of Python just don't work with raw bytes as-is.  is there a document somewhere that explains bytes and how to use them much like you would in C so i don't have to switch back to C?

for example when i have:

for line in sys.stdin:
    do_something(line)
how can i have the value of line be bytes, even with some bytes > 127?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Where do you get the error, specifically? You should be able to just .encode() the input to get a bytes object.
for line in sys.stdin:
    do_something(line.encode())
Reply
#3
i tried that.  it expects everything to be valid utf-8 or whatever encoding is specified.  but my data is not of any encoding.  much of it is numeric values formatted as a series of bytes.  anything goes at these points with the exception that newlines in the numbers are escaped and real newlines exist for the real end of a line.  i need to tell python to just leave all bytes as-is for everything except line separation will work at normal newlines.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
Have you tried struct package?
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#5
If a file is opened in "binary", you get  a bytes object. Otherwise you get a str object:
Assuming a "bytes.dat" file that contains Déjà vu!\n (encoded in UTF-8) the following code:
with open('bytes.dat','rb') as f:
   dat=f.read()
   print(type(dat), len(dat))
   
with open('bytes.dat','r') as f:
   dat=f.read()
   print(type(dat), len(dat))
yields:
Output:
<class 'bytes'> 11 <class 'str'> 9
Of course, sys.stdin is already opened in text mode, so it reads str objects and not bytes. But you can read the binary buffer object sys.stdin is based on. See the note here.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020