Python Forum
A small bug ? (Perhaps an ant?)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A small bug ? (Perhaps an ant?)
#1
Hi all! Regards! Smile

I'm new at Python ecossystem and some days behind i got a problem.

I'm using Python 3.12 at W10

'My' bug is shown by following code:

# The problem is on use of   ''.split()   vs  ''.split( ' ' )

exp   =   ['a', 'b']                # expected result list

def check( a, x ) :
    global  exp
    if  x == exp :  res = 'ok'
    else         :  res = 'error'
    print( "a='%s' -> %s  ( expected : %s .. %s )\n" % (a,x,exp,res) )

print( '-'*24, " Using ''.split() ", '-'*24, '\n' )
a   =   'a b'           # one space separating tokens
x   =   a.split()
check( a, x )

a   =   'a  b'          # two spaces separating tokens
x   =   a.split()
check( a, x )

a   =   'a   b'
x   =   a.split()       # three spaces separating tokens
check( a, x )

print( '-'*22, " Using ''.split(' ') ", '-'*23, '\n' )
a   =   'a b'
x   =   a.split(' ')
check( a, x )           # one space separating tokens

a   =   'a  b'          # two spaces separating tokens
x   =   a.split(' ')
check( a, x )

a   =   'a   b'       # three spaces separating tokens
x   =   a.split(' ')
check( a, x )
print( '-'*68, '\n' )
#### MY RESULTS ####
Huh Huh Huh
Output:
------------------------ Using ''.split() ------------------------ a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) ---------------------- Using ''.split(' ') ----------------------- a='a b' -> ['a', 'b'] ( expected : ['a', 'b'] .. ok ) a='a b' -> ['a', '', 'b'] ( expected : ['a', 'b'] .. error ) a='a b' -> ['a', '', '', 'b'] ( expected : ['a', 'b'] .. error ) --------------------------------------------------------------------
Confused
# Is this really a bug, or did i misunderstood .split() specifications ?
deanhystad write Jul-20-2023, 04:57 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
This is user error. Python is so user friendly that I think many users just try stuff without reading the documentation. Not too many would declare they found a bug in Python without first reading and fully understanding the documentation. You are bold.

From the built in types documentation: https://docs.python.org/3/library/stdtyp...#str.split
Quote:str.rsplit(sep=None, maxsplit=- 1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones. If sep is not specified or None, any whitespace string is a separator. Except for splitting from the right, rsplit() behaves like split() which is described in detail below.
When sep is not specified, any whitespace string is a separator. When you specify sep = " " the separator ceases to be "any whitespace string" and becomes the string between those quotes, exactly.

Whitespace characters are (according to string.whitespace) the space, horizontal and vertical tabs, line feed, form feed and carriage return.
def to_literal(chars):
    if chars is None:
        return " "
    whitespace = {"\n": r"\n", "\r": r"\r", "\t": r"\t"}
    chars = [whitespace.get(c, c) for c in chars]
    return "".join(chars)

def test(src, sep=None, exp="['a', 'b']"):
    result = str(src.split(sep))
    src = to_literal(src)
    sep = to_literal(sep)
    if result == exp:
        print(f"'{src}'.split('{sep}') == {exp}")
    else:
        print(f"'{src}'.split('{sep}') == {result}, not {exp}")

test("a b")
test("a\nb")
test("a\tb")
test("a\rb")
test("a b", " ")
test("a\nb", "\n")
test("a\tb", "\t")
test("a\rb", "\r")

test("a  b")
test("a\n\nb")
test("a\t\tb")
test("a\r\rb")
test("a \n\r\tb")
test("a  b", " ")
test("a\n\nb", "\n")
test("a\t\tb", "\t")
test("a\r\rb", "\r")
test("a \n\r\tb", " \n\r\t")
Output:
'a b'.split(' ') == ['a', 'b'] 'a\nb'.split(' ') == ['a', 'b'] 'a\tb'.split(' ') == ['a', 'b'] 'a\rb'.split(' ') == ['a', 'b'] 'a b'.split(' ') == ['a', 'b'] 'a\nb'.split('\n') == ['a', 'b'] 'a\tb'.split('\t') == ['a', 'b'] 'a\rb'.split('\r') == ['a', 'b'] 'a b'.split(' ') == ['a', 'b'] 'a\n\nb'.split(' ') == ['a', 'b'] 'a\t\tb'.split(' ') == ['a', 'b'] 'a\r\rb'.split(' ') == ['a', 'b'] 'a \n\r\tb'.split(' ') == ['a', 'b'] 'a b'.split(' ') == ['a', '', 'b'], not ['a', 'b'] 'a\n\nb'.split('\n') == ['a', '', 'b'], not ['a', 'b'] 'a\t\tb'.split('\t') == ['a', '', 'b'], not ['a', 'b'] 'a\r\rb'.split('\r') == ['a', '', 'b'], not ['a', 'b'] 'a \n\r\tb'.split(' \n\r\t') == ['a', 'b']
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020