Python Forum

Full Version: bytes f-string ?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i can make a bytes string:
bs = b'foobar'
i can make an f-string:
foo = 'hello'
fs = f'{foo}bar'
now i want to make a bytes f-string. all of the characters have values below 256. do i have to do it like this?
foo = 'hello'
fs = f'{foo}bar'
bfs = bytes(ord(x)for x in fs)
it doesn't seem to work as well if foo is bytes:
foo = b'hello'
fs = f'{foo}bar'
bfs = bytes(ord(x)for x in fs)
The brace expands to the str representation of the object. And the str representation of foo is b'hello'. So that's what gets inserted.

You could try to convert your bytestring before inserting, or you could assemble your bytes together directly.

foo = b'hello'
fs = f'{foo.decode()}bar'
bfs = fs.encode()
print(bfs)

foo = b'hello'
fs = foo + 'bar'.encode() # Not using f-strings
print(fs)
It won't work with UTF8. You can use the latin-1 encoding
>>> foo = 'hello'
>>> f'{foo}bar'.encode('latin-1')
b'hellobar'
Proof that it works:
>>> def asbytes(i):
...     return chr(i).encode('latin-1')
... 
>>> for i in range(256):
...     s = asbytes(i)
...     assert len(s) == 1
...     assert s[0] == i
... 
>>>
still, it makes code look ugly because i can't do something like:
fs = bf'{foo}bar'
I miss this too. I guess the developers wanted to prevent unwanted accidents.
But if it were implemented, it should take the bytes and not a str as input and should throw an TypeError, if the object has no __bytes__ method. A normal f-string takes the representation of an object, if there is not a __str__ method. I think with a format-bytes-string this leads into unwanted errors, if the representation is taken, if there is no __bytes__ method.
I have a less ugly solution using two functions ub() and bu() which stand respectively for 'unicode bytes' and 'bytes unicode'. It gives code like
>>> foo = b'spam'
>>> s = bu(f'{ub(foo)}bar')
>>> s
b'spambar'
Converting a bytes or a str with the ub() function returns a unicode string that contains only characters which ord is lower than 256. For str, it raises an exception if it is not possible, for example ub('€') fails. Converting a bytes or a str with bu() converts it to bytes but it will fail for str that contain unicode characters beyond 256. The first letter u or b mnemotechnically indicates if the function returns unicode or bytes, thus ub() returns unicode and bu() returns bytes.

Here is the code defining these functions
from functools import singledispatch

__version__ = '2021.06.11'


class _Ub(str):
    """A subtype of str that can contain only chars with ord < 256
    """
    __slots__ = ()
    
    def __new__(cls, s):
        instance = str.__new__(cls, s)
        instance.encode('latin-1') # fail if there is a char beyond 256
        return instance

    def __bytes__(self):
        return self.encode('latin-1')

@singledispatch
def ub(s):
    """Convert argument to 'unicode bytes' a subclass of str
    
    Returns an instance of a subclass of str that contains only
    unicode characters with ord < 256.

    'ub' stands for 'unicode bytes'
    """
    return _Ub(s)

@ub.register(bytes)
@ub.register(bytearray)
def _(s):
    return _Ub(s.decode('latin-1'))

@ub.register(_Ub)
def _(s):
    return s

@singledispatch
def bu(s):
    """Convert to bytes an object which str() has only characters < 256.
    
    'bu' stands for 'bytes unicode'
    """
    return bytes(ub(s))

@bu.register(bytes)
def _(s):
    return s

@bu.register(bytearray)
@bu.register(_Ub)
def _(s):
    return bytes(s)


def main():
    x = 'hello'
    print(ub(x))
    y = b'world'
    print(ub(y))
    print(bytes(y))
    z = bytearray(b'nice')
    print(ub(z))
    print(str(z))
    print(bytes(z))
    
    foo = b'spam'
    s = bu(f'{ub(foo)}bar')
    print(s)
    
if __name__ == '__main__':
    main()