documentation for raw bytes - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: General (https://python-forum.io/forum-1.html) +--- Forum: News and Discussions (https://python-forum.io/forum-31.html) +--- Thread: documentation for raw bytes (/thread-3039.html) |
documentation for raw bytes - Skaperen - Apr-26-2017 i have a few programs to write that i could have written in C and would have had no problems. but i started writing them in Python3 and am running into the code conversion issues because a few of the bytes are above 127 in value. and they are not all valid UTF-8. they do not represent Unicode characters. they are raw bytes. they are what they are. but Python3 gets upset over them. i have been trying to read the documentation on this, but it is confusing. i need to use bytes. but many parts of Python just don't work with raw bytes as-is. is there a document somewhere that explains bytes and how to use them much like you would in C so i don't have to switch back to C? for example when i have: for line in sys.stdin: do_something(line)how can i have the value of line be bytes, even with some bytes > 127? RE: documentation for raw bytes - nilamo - Apr-26-2017 Where do you get the error, specifically? You should be able to just .encode() the input to get a bytes object. for line in sys.stdin: do_something(line.encode()) RE: documentation for raw bytes - Skaperen - Apr-26-2017 i tried that. it expects everything to be valid utf-8 or whatever encoding is specified. but my data is not of any encoding. much of it is numeric values formatted as a series of bytes. anything goes at these points with the exception that newlines in the numbers are escaped and real newlines exist for the real end of a line. i need to tell python to just leave all bytes as-is for everything except line separation will work at normal newlines. RE: documentation for raw bytes - volcano63 - Apr-26-2017 Have you tried struct package? RE: documentation for raw bytes - Ofnuts - Apr-26-2017 If a file is opened in "binary", you get a bytes object. Otherwise you get a str object:Assuming a "bytes.dat" file that contains Déjà vu!\n (encoded in UTF-8) the following code:with open('bytes.dat','rb') as f: dat=f.read() print(type(dat), len(dat)) with open('bytes.dat','r') as f: dat=f.read() print(type(dat), len(dat))yields: Of course, sys.stdin is already opened in text mode, so it reads str objects and not bytes . But you can read the binary buffer object sys.stdin is based on. See the note here.
|