i can make a bytes string:
bs = b'foobar'
i can make an f-string:
foo = 'hello'
fs = f'{foo}bar'
now i want to make a bytes f-string. all of the characters have values below 256. do i have to do it like this?
foo = 'hello'
fs = f'{foo}bar'
bfs = bytes(ord(x)for x in fs)
it doesn't seem to work as well if foo is bytes:
foo = b'hello'
fs = f'{foo}bar'
bfs = bytes(ord(x)for x in fs)
The brace expands to the str representation of the object. And the str representation of
foo
is
b'hello'
. So that's what gets inserted.
You could try to convert your bytestring before inserting, or you could assemble your bytes together directly.
foo = b'hello'
fs = f'{foo.decode()}bar'
bfs = fs.encode()
print(bfs)
foo = b'hello'
fs = foo + 'bar'.encode() # Not using f-strings
print(fs)
It won't work with UTF8. You can use the latin-1 encoding
>>> foo = 'hello'
>>> f'{foo}bar'.encode('latin-1')
b'hellobar'
Proof that it works:
>>> def asbytes(i):
... return chr(i).encode('latin-1')
...
>>> for i in range(256):
... s = asbytes(i)
... assert len(s) == 1
... assert s[0] == i
...
>>>
still, it makes code look ugly because i can't do something like:
fs = bf'{foo}bar'
I miss this too. I guess the developers wanted to prevent unwanted accidents.
But if it were implemented, it should take the bytes and not a str as input and should throw an TypeError, if the object has no __bytes__ method. A normal f-string takes the representation of an object, if there is not a __str__ method. I think with a format-bytes-string this leads into unwanted errors, if the representation is taken, if there is no __bytes__ method.
I have a less ugly solution using two functions
ub()
and
bu()
which stand respectively for 'unicode bytes' and 'bytes unicode'. It gives code like
>>> foo = b'spam'
>>> s = bu(f'{ub(foo)}bar')
>>> s
b'spambar'
Converting a bytes or a str with the
ub()
function returns a unicode string that contains only characters which ord is lower than 256. For str, it raises an exception if it is not possible, for example
ub('€')
fails. Converting a bytes or a str with
bu()
converts it to bytes but it will fail for str that contain unicode characters beyond 256. The first letter u or b mnemotechnically indicates if the function returns unicode or bytes, thus ub() returns unicode and bu() returns bytes.
Here is the code defining these functions
from functools import singledispatch
__version__ = '2021.06.11'
class _Ub(str):
"""A subtype of str that can contain only chars with ord < 256
"""
__slots__ = ()
def __new__(cls, s):
instance = str.__new__(cls, s)
instance.encode('latin-1') # fail if there is a char beyond 256
return instance
def __bytes__(self):
return self.encode('latin-1')
@singledispatch
def ub(s):
"""Convert argument to 'unicode bytes' a subclass of str
Returns an instance of a subclass of str that contains only
unicode characters with ord < 256.
'ub' stands for 'unicode bytes'
"""
return _Ub(s)
@ub.register(bytes)
@ub.register(bytearray)
def _(s):
return _Ub(s.decode('latin-1'))
@ub.register(_Ub)
def _(s):
return s
@singledispatch
def bu(s):
"""Convert to bytes an object which str() has only characters < 256.
'bu' stands for 'bytes unicode'
"""
return bytes(ub(s))
@bu.register(bytes)
def _(s):
return s
@bu.register(bytearray)
@bu.register(_Ub)
def _(s):
return bytes(s)
def main():
x = 'hello'
print(ub(x))
y = b'world'
print(ub(y))
print(bytes(y))
z = bytearray(b'nice')
print(ub(z))
print(str(z))
print(bytes(z))
foo = b'spam'
s = bu(f'{ub(foo)}bar')
print(s)
if __name__ == '__main__':
main()