Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
byte types
#1
i find many times i write some code or a function or a method that does something with strings and i want it to also handle byte strings since i use those a lot. so initially i coded a test like:
if isinstance(variable,(bytes,bytearray)):
    ...
which woks OK in Python3 but fails in Python2 because bytes is the same as str in Python2. so, i came up with this way to do the test so that it only tests for bytearray when in Python2 when bytes is the same as str:
if isinstance(variable,(bytes,bytearray)[bytes==str:]):
     ...
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Just testing for Python's version?
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
I think the best way is to avoid type checking. You have at least two options
  1. Write functions with different names for str and bytes, for example spam(x) for unicode and bspam(x) for bytes.
  2. Use generic functions (singledispatch) to select according to the first argument's type.
Reply
#4
1. in many cases, common code with no type checking can be done. that means an inconsistent naming scheme based on how it works internally. i prefer to name functions based on API design before i code to implement, which may or may not, need type check for small differences. in some cases, there may be quite many types that can be handled and i don't want to have so many different functions coded which also means more maintenance word. so i cannot adopt the "spam/bspam" scheme.

2, i don't understand this suggestion.

1. also... in many cases, i do type checking to validate, perhaps raising a more meaningful exception if the wrong type is provided.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
(Nov-01-2018, 06:22 PM)Skaperen Wrote: i don't understand this suggestion.
Here is an example with a generic function ddouble() defined for three types in python 2 and 3
from itertools import chain
import sys

if sys.version_info < (3,):
    from singledispatch import singledispatch
else:
    from functools import singledispatch
    unicode = str

@singledispatch
def ddouble(item):
    raise NotImplementedError

@ddouble.register(unicode)
def _(s):
    return ''.join(x+x for x in s)

@ddouble.register(bytearray)
def _(s):
    return bytearray(chain.from_iterable(zip(*(s, s))))

@ddouble.register(bytes)
def _(s):
    return bytes(ddouble(bytearray(s)))

if __name__ == "__main__":
    data = b'Hello world!'
    udata = data.decode()
    print(repr(ddouble(data)))
    print(repr(ddouble(udata)))
Output:
b'HHeelllloo wwoorrlldd' 'HHeelllloo wwoorrlldd'
(Nov-01-2018, 06:22 PM)Skaperen Wrote: i prefer to name functions based on API design before i code
The types of a function's parameters are an important part of the API. You actually want function overloading. Generic functions are definitely a way.
Reply
#6
whatever you want to call it, accepting a variety of types for an argument and then doing what makes sense with the mix of the semantics of that argument and the semantics of the type that is given is something i have done for years, found to be well understood, benefitted the projects, and i expect to continue. why would Python programmers be any less capable of understanding it and working with it? an example of a function i have been thinking about is one that i would define as "counting the number of character encodings in a sequence containing byte codes". the len() function (notice, just one name for all types) used on such a sequence would not give the number of characters encoded, but rather, the number of codes present to make up that coding. but, in most cases involving text processing, the number of characters is what is useful. i might call this function clen(). i would code it to accept as many sequence types as i can. this would at least be bytes, bytearray, string, list, and tuple (for Python3). ints could be the codes for a list or tuple. the user (of my function, the coder of the app calling my function) would not need to check the type they have to figure out which of at least 5 different function names they have; they just call clen() with whatever they have and get an answer. the check for an unaccepted type would then be except TypeError: (or a value configured to be returned in such a case by that calling program).

yes, i have read a bunch of opinions (posts, blogs) by a bunch of people saying not to use dynamic typing. but they tend to be specifically opposed to dynamic typing and dynamic data, so i generally dismiss them. they seem to be as numerous in Python as in Pike even though dynamic typing is not the default in Pike (coders have to specify mixed as the type of the variable, so dynamic typing is easy to avoid). they tend to be less numerous for C since in C you don't have dynamic typing at all without adding it on.

i am no longer open minded about dynamic typing. i have made my decision and i embrace dynamic typing. that decision, made years before i started using Python, is a major reason i got into coding in Python.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
(Nov-02-2018, 12:05 AM)Skaperen Wrote: the len() function (notice, just one name for all types)
It is true, but for user-defined types the len() function invokes the __len__() method, that is to say it uses normal method lookup to select the function to execute. Generic functions provide similarly a way to select the function by the type of the first argument. They handle both builtin types and user-defined types
(Nov-02-2018, 12:05 AM)Skaperen Wrote: the check for an unaccepted type would then be except TypeError
Generic functions can do this very well
@singledispatch
def clen(x):
    raise TypeError

@clen.register(str)
def_(x):
    ... # code for string instance 

@clen.register(tuple)
def_(x):
    ... # code for tuple instance

@clen.register(list)
def_(x):
    ... # code for list instance
# etc
(Nov-02-2018, 12:05 AM)Skaperen Wrote: by a bunch of people saying not to use dynamic typing.
This is not what I'm saying. I'm saying that there is a normal way to dispatch calls according to the argument type and this is the method lookup algorithm in classes, with the mro and tutti quanti. Generic functions extend this feature by allowing to incorporate builtin classes. They also give a way to code new methods separately for existing classes hierarchies. I'm saying that one should avoid as much as possible to write if statements that check the types of the arguments.
(Nov-02-2018, 12:05 AM)Skaperen Wrote: "counting the number of character encodings in a sequence containing byte codes"
Can you elaborate on this? I don't understand what clen() should return. Can you give examples with bytes, str, tuple?
Reply
#8
you're talking more about how things like len() are implemented, which is not something that matters to the programmer (nor should it) while i am talking about how the API design affects the programmer and how she goes about using that design to make use of the facility it describes. if Python did not have a len() function and it had to be user implemented, would you want there to be different function call names depending on the datatype the caller wants to get the length of? i would not. i would describe the API before any implementation is done (although there might be design changes after testing has a test implementation done and it is then discovered that the API design had flaws that needed to be changed or additions would be helpful). consider the clen() function i described in my previous post. what if Python was going to include it in the language? its implementation might well be like how len() is done, calling the __clen__() method in each specific type. but what if, before this, it had been decided to have different names for different types lik lclen() for lists, tclen() for tuples, and baclen() for bytearrays? would you expect to see the name changed because of the way it is implemented? i would not. and this is why i would just call it clen() to begin with.

i do have a couple of functions in my function collection that have type varying names. but their names are based on what type is returned ... bachr() to return one bytearray character from its code number ... and bchr() for bytes type. both accept int as the one argument.

dynamic typing as a language feature seems to be rather lost in Python due to a lot of people effectively saying not to use it. for me, it is one of the great features of Python.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#9
A custom class with a __len__ method will be good enough. It can return whatever you want and calling len(instance) could return the number of words, bytes, bits, rows, tables, stars, threes, people, cats, dogs, etc.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#10
(Nov-02-2018, 06:06 PM)wavic Wrote: A custom class with a __len__ method will be good enough. It can return whatever you want and calling len(instance) could return the number of words, bytes, bits, rows, tables, stars, threes, people, cats, dogs, etc.

but, what if you have read in a line from a file using os.read() or by some other means ended up with data in a bytes or bytearray type, or have a list or tuple of the byte code numbers as ints, and want to know the number of utf-8 encoded characters, or want to know the uncompressed size of th original data that was compressed into that file? __len__() for existing types/classses does not give that (so len() does not, either).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020