Python Forum

Full Version: convert a list of string+bytes into a list of strings (python 3)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone!

I would like to extract the first bytes of a binary file, and convert them into strings, in a list.
def binaryMethod(files, lenI):
outList = [] 
for _ in range(lenI):
            # read the length of the next string (read only the first 4 bytes) 
            blen = int(ceil(float(unpack('>I', files.read(4))[0]) / 4) * 4)
            # store the  string into outList
            outList.append(str(unpack('%ds' % blen, files.read(blen))[0]).replace("\x00", ""))
where lenI is an integer.
files is the binary file.
This method works in Python 2 but not in Python 3 Huh

in Python 3, when I do
print (outList)

Output:
["b\'RT\\\\x00\\\\x00\'", "b\'RT\\\\x00\\\\x00\'", "b\'ts\\\\x00\\\\x00\'", "b\'MI\\\\x00\\\\x00\', ..."
While I am expecting something like: ["RT1", "RT2, ""ts1", "MI1", ...]

Thank you in advance for your help Smile
I forgot to mention, I tried to decode an element of the list, it did not work:
test = outList[0].decode('utf-8')
Output:
AttributeError: \'str\' object has no attribute \'decode\'
This doesn't make any sense.
First the format for a bytes string is zz = b'value'
and to decode, z = zz.decode('utf-8')
I have no idea what type of data "b\'RT\\\\x00\\\\x00\'" would be, certainly not bytes.
Yeah this is hexadecimal format
Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.

We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is "utf-8" , so you can use directly:

b"python byte to string".decode("utf-8")