Python Forum
Time complexity different between operating systems. Why?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Time complexity different between operating systems. Why?
#1
Mekire: Migrating a thread from the old forum which I think provided good discussion so as to avoid linking to the dead forum.

Ok, so first off I have a dual boot setup with Windows 7 and Linux Mint 13.  I generally work in Windows because I'm honestly not so comfortable in Linux but I like to be able to test compatibility and such when necessary.

Brought on by a recent thread, I realized that certain operations were not performing the same on the two OS.

Specifically string concatenation and the str.join method.  My original assertion was that the string concatenation method should be completely avoided because it goes quadratic.

Take the following code (still using time instead of timeit; still lazy):
import time

def make_string_concat(num):
    l = ''
    for i in xrange(num):
        l += '{}\n'.format(i)
    return l

def make_string_join(num):
    return "\n".join(str(i) for i in xrange(num))+"\n"

###
if __name__ == "__main__":
    NUMBERS = [1000000,2000000,3000000]
    for number in NUMBERS:
        print("n = {}".format(number))
        
        start = time.time()
        DATA1 = make_string_join(number)
        print("Time to generate data (str.join): {}".format(time.time()-start))

        start = time.time()
        DATA2 = make_string_concat(number)
        print("Time to generate data (concatenation): {}\n".format(time.time()-start))
So on windows the result I get is:
>>> 
n = 1000000
Time to generate data (str.join): 0.389999866486
Time to generate data (concatenation): 2.88599991798

n = 2000000
Time to generate data (str.join): 0.655000209808
Time to generate data (concatenation): 12.2779998779

n = 3000000
Time to generate data (str.join): 0.99799990654
Time to generate data (concatenation): 28.2520000935

>>> 
It would appear that the str.join stays linear while the concatenation blows up quadratically.

Then Metulburr showed me his results for the same code, which completely contradicted mine.  I rebooted to Linux and tested the same code.

Result with Linux on the same computer:
>>> 
n = 1000000
Time to generate data (str.join): 0.224174022675
Time to generate data (concatenation): 0.316344022751

n = 2000000
Time to generate data (str.join): 0.449282884598
Time to generate data (concatenation): 0.658910989761

n = 3000000
Time to generate data (str.join): 0.677663803101
Time to generate data (concatenation): 0.943947792053

>>> 
Here, while the join method is still superior, they both appear to have linear time complexity.

Aside from the standard, "Lol Windows," line of reasoning; what would be causing the time complexity of concatenation to be O(n^2) with one operating system, and o(n) with the other?

-Mek
Reply
#2
Mekire: Migrating a thread from the old forum which I think provided good discussion so as to avoid linking to the dead forum.

I sometimes have anything from dual boot, triple boot, to quad boot. But for compatibility testing, i normally use virtual box with windows 7 set up in it. Which is a lot easier and quicker than rebooting into another system. Of course beforehand, i didnt really look at the timing differences of the two between Windows and linux. Not sure why concatenation in windows is so slow. Would be intersting to know.

i guess i should ensure str.join instead for my code that is going to be runnning on windows, huh? lol

No wonder why someone said that str.join is faster, if they did it in windows and got those results...and i did it in linux, i am thinking, "you are complaing about .2 seconds?", when in reality they were talking about X number of seconds.
Reply
#3
Mekire: Migrating a thread from the old forum which I think provided good discussion so as to avoid linking to the dead forum.

I've done a little research and this is a summary of what I've found so far. It may not be 100% accurate.

In CPython 2.4, += was optimized to try and resize a string object in-place. Previous versions of CPython would always create a new string object and copy the data. Beginning with 2.4, CPython will try (when it can) to just resize the memory block without copying the data. It does this by calling the realloc() function provided by the operating system. It appears that the realloc() function on Windows copies the data to a new memory block while the realloc() on Linux just extends the existing block without copying the data.

realloc() doesn't guarantee either type of behavior. It just guarantees that it will return a pointer to block of memory with the new size and any data from the old block will be copied if needed. Different strategies can be taken to optimize different goals: speed vs. minimize memory fragmentation vs. minimize total footprint. Your test happened to highlight one of the differences.

When concatenating strings, join() is guaranteed to run in linear time complexity. The change in CPython 2.4 improves the performance in a common environment. Because the improved += behavior is not part of the language specification, it shouldn't be relied on. join() is the recommended way to concatenate a large number of strings.

Side note: when running benchmarks, you usually should disable CPython's garbage collection. The garbage collector automatically triggers when a large number of objects are created and deleted. The purpose of the garbage collector is to look for, and try to delete, isolated reference cycles: object A refers to B and B refers back to A, but no other references to A or B exist. Since your benchmark isn't creating cycles, it isn't needed. GC is useful for more complex programs.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  time complexity of int Skaperen 6 2,625 Jul-05-2020, 06:02 PM
Last Post: Skaperen
  Python as an operating system boot up megatronixs 4 2,365 Nov-18-2019, 08:48 AM
Last Post: DeaD_EyE
  In place of py2exe (for GNU/Linux systems) and MultiOS executables (Yes) Lachu 5 4,923 Jan-18-2019, 07:25 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020