Python Forum

Full Version: documentation for raw bytes
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i have a few programs to write that i could have written in C and would have had no problems.  but i started writing them in Python3 and am running into the code conversion issues because a few of the bytes are above 127 in value.  and they are not all valid UTF-8.  they do not represent Unicode characters.  they are raw bytes.  they are what they are.  but Python3 gets upset over them.  i have been trying to read the documentation on this, but it is confusing.  i need to use bytes.  but many parts of Python just don't work with raw bytes as-is.  is there a document somewhere that explains bytes and how to use them much like you would in C so i don't have to switch back to C?

for example when i have:

for line in sys.stdin:
    do_something(line)
how can i have the value of line be bytes, even with some bytes > 127?
Where do you get the error, specifically? You should be able to just .encode() the input to get a bytes object.
for line in sys.stdin:
    do_something(line.encode())
i tried that.  it expects everything to be valid utf-8 or whatever encoding is specified.  but my data is not of any encoding.  much of it is numeric values formatted as a series of bytes.  anything goes at these points with the exception that newlines in the numbers are escaped and real newlines exist for the real end of a line.  i need to tell python to just leave all bytes as-is for everything except line separation will work at normal newlines.
Have you tried struct package?
If a file is opened in "binary", you get  a bytes object. Otherwise you get a str object:
Assuming a "bytes.dat" file that contains Déjà vu!\n (encoded in UTF-8) the following code:
with open('bytes.dat','rb') as f:
   dat=f.read()
   print(type(dat), len(dat))
   
with open('bytes.dat','r') as f:
   dat=f.read()
   print(type(dat), len(dat))
yields:
Output:
<class 'bytes'> 11 <class 'str'> 9
Of course, sys.stdin is already opened in text mode, so it reads str objects and not bytes. But you can read the binary buffer object sys.stdin is based on. See the note here.