Python Forum

Full Version: A small bug ? (Perhaps an ant?)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all! Regards! Smile

I'm new at Python ecossystem and some days behind i got a problem.

I'm using Python 3.12 at W10

'My' bug is shown by following code:

# The problem is on use of   ''.split()   vs  ''.split( ' ' )

exp   =   ['a', 'b']                # expected result list

def check( a, x ) :
    global  exp
    if  x == exp :  res = 'ok'
    else         :  res = 'error'
    print( "a='%s' -> %s  ( expected : %s .. %s )\n" % (a,x,exp,res) )

print( '-'*24, " Using ''.split() ", '-'*24, '\n' )
a   =   'a b'           # one space separating tokens
x   =   a.split()
check( a, x )

a   =   'a  b'          # two spaces separating tokens
x   =   a.split()
check( a, x )

a   =   'a   b'
x   =   a.split()       # three spaces separating tokens
check( a, x )

print( '-'*22, " Using ''.split(' ') ", '-'*23, '\n' )
a   =   'a b'
x   =   a.split(' ')
check( a, x )           # one space separating tokens

a   =   'a  b'          # two spaces separating tokens
x   =   a.split(' ')
check( a, x )

a   =   'a   b'       # three spaces separating tokens
x   =   a.split(' ')
check( a, x )
print( '-'*68, '\n' )
#### MY RESULTS ####
Huh Huh Huh
Output:
------------------------ Using ''.split() ------------------------ a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) ---------------------- Using ''.split(' ') ----------------------- a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) a='a b' -> ['a', '', 'b'] ( expected : ['a', 'b'] .. error ) a='a b' -> ['a', '', '', 'b'] ( expected : ['a', 'b'] .. error ) --------------------------------------------------------------------
Confused
# Is this really a bug, or did i misunderstood .split() specifications ?
This is user error. Python is so user friendly that I think many users just try stuff without reading the documentation. Not too many would declare they found a bug in Python without first reading and fully understanding the documentation. You are bold.

From the built in types documentation: https://docs.python.org/3/library/stdtyp...#str.split
Quote:str.rsplit(sep=None, maxsplit=- 1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones. If sep is not specified or None, any whitespace string is a separator. Except for splitting from the right, rsplit() behaves like split() which is described in detail below.
When sep is not specified, any whitespace string is a separator. When you specify sep = " " the separator ceases to be "any whitespace string" and becomes the string between those quotes, exactly.

Whitespace characters are (according to string.whitespace) the space, horizontal and vertical tabs, line feed, form feed and carriage return.
def to_literal(chars):
    if chars is None:
        return " "
    whitespace = {"\n": r"\n", "\r": r"\r", "\t": r"\t"}
    chars = [whitespace.get(c, c) for c in chars]
    return "".join(chars)

def test(src, sep=None, exp="['a', 'b']"):
    result = str(src.split(sep))
    src = to_literal(src)
    sep = to_literal(sep)
    if result == exp:
        print(f"'{src}'.split('{sep}') == {exp}")
    else:
        print(f"'{src}'.split('{sep}') == {result}, not {exp}")

test("a b")
test("a\nb")
test("a\tb")
test("a\rb")
test("a b", " ")
test("a\nb", "\n")
test("a\tb", "\t")
test("a\rb", "\r")

test("a  b")
test("a\n\nb")
test("a\t\tb")
test("a\r\rb")
test("a \n\r\tb")
test("a  b", " ")
test("a\n\nb", "\n")
test("a\t\tb", "\t")
test("a\r\rb", "\r")
test("a \n\r\tb", " \n\r\t")
Output:
'a b'.split(' ') == ['a', 'b'] 'a\nb'.split(' ') == ['a', 'b'] 'a\tb'.split(' ') == ['a', 'b'] 'a\rb'.split(' ') == ['a', 'b'] 'a b'.split(' ') == ['a', 'b'] 'a\nb'.split('\n') == ['a', 'b'] 'a\tb'.split('\t') == ['a', 'b'] 'a\rb'.split('\r') == ['a', 'b'] 'a b'.split(' ') == ['a', 'b'] 'a\n\nb'.split(' ') == ['a', 'b'] 'a\t\tb'.split(' ') == ['a', 'b'] 'a\r\rb'.split(' ') == ['a', 'b'] 'a \n\r\tb'.split(' ') == ['a', 'b'] 'a b'.split(' ') == ['a', '', 'b'], not ['a', 'b'] 'a\n\nb'.split('\n') == ['a', '', 'b'], not ['a', 'b'] 'a\t\tb'.split('\t') == ['a', '', 'b'], not ['a', 'b'] 'a\r\rb'.split('\r') == ['a', '', 'b'], not ['a', 'b'] 'a \n\r\tb'.split(' \n\r\t') == ['a', 'b']