Python Forum
how to read a text file as bytes - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: how to read a text file as bytes (/thread-10630.html)



how to read a text file as bytes - Skaperen - May-29-2018

i want to read a file as a list of byte types, one for each line where '\n' or '\r' or '\r\n' or even the unlikely '\n\r' are marking end of line. the file might be so large that two copies of it in memory can exhaust memory so i need a way that is not "read in the whole file and do a big split". there are UTF-8 byte sequences with non-ASCII values in some of the files. i am using Python3. what is a good Pythonic way to no this? can the os module be avoided?


RE: how to read a text file as bytes - killerrex - May-29-2018

The "per line" operator works with binary files using '\n' as separator, so you can do something like:
with open('input.bin', 'rb') as fd:
    for line in fd:
        for sub_line in line.split(b'\r'):
            # Take into account any single '\r'
            if not sub_line:
                # If you want to deal also with zero length groups, thsi must be improved...
                continue
            # Do something with the lines
            pass
If you can guarantee in the input format that splitting by '\n' is safe is a good way.
Other option is to use a memory map.