Mekire: Migrating a thread from the old forum which I think provided good discussion so as to avoid linking to the dead forum.
I've done a little research and this is a summary of what I've found so far. It may not be 100% accurate.
In CPython 2.4, += was optimized to try and resize a string object in-place. Previous versions of CPython would always create a new string object and copy the data. Beginning with 2.4, CPython will try (when it can) to just resize the memory block without copying the data. It does this by calling the realloc() function provided by the operating system. It appears that the realloc() function on Windows copies the data to a new memory block while the realloc() on Linux just extends the existing block without copying the data.
realloc() doesn't guarantee either type of behavior. It just guarantees that it will return a pointer to block of memory with the new size and any data from the old block will be copied if needed. Different strategies can be taken to optimize different goals: speed vs. minimize memory fragmentation vs. minimize total footprint. Your test happened to highlight one of the differences.
When concatenating strings, join() is guaranteed to run in linear time complexity. The change in CPython 2.4 improves the performance in a common environment. Because the improved += behavior is not part of the language specification, it shouldn't be relied on. join() is the recommended way to concatenate a large number of strings.
Side note: when running benchmarks, you usually should disable CPython's garbage collection. The garbage collector automatically triggers when a large number of objects are created and deleted. The purpose of the garbage collector is to look for, and try to delete, isolated reference cycles: object A refers to B and B refers back to A, but no other references to A or B exist. Since your benchmark isn't creating cycles, it isn't needed. GC is useful for more complex programs.
I've done a little research and this is a summary of what I've found so far. It may not be 100% accurate.
In CPython 2.4, += was optimized to try and resize a string object in-place. Previous versions of CPython would always create a new string object and copy the data. Beginning with 2.4, CPython will try (when it can) to just resize the memory block without copying the data. It does this by calling the realloc() function provided by the operating system. It appears that the realloc() function on Windows copies the data to a new memory block while the realloc() on Linux just extends the existing block without copying the data.
realloc() doesn't guarantee either type of behavior. It just guarantees that it will return a pointer to block of memory with the new size and any data from the old block will be copied if needed. Different strategies can be taken to optimize different goals: speed vs. minimize memory fragmentation vs. minimize total footprint. Your test happened to highlight one of the differences.
When concatenating strings, join() is guaranteed to run in linear time complexity. The change in CPython 2.4 improves the performance in a common environment. Because the improved += behavior is not part of the language specification, it shouldn't be relied on. join() is the recommended way to concatenate a large number of strings.
Side note: when running benchmarks, you usually should disable CPython's garbage collection. The garbage collector automatically triggers when a large number of objects are created and deleted. The purpose of the garbage collector is to look for, and try to delete, isolated reference cycles: object A refers to B and B refers back to A, but no other references to A or B exist. Since your benchmark isn't creating cycles, it isn't needed. GC is useful for more complex programs.