Posts: 8,154
Threads: 160
Joined: Sep 2016
Oct-09-2017, 08:44 AM
(This post was last modified: Oct-09-2017, 09:59 AM by buran.)
Seeing this tweet by Raymond Hettinger, I wanted to compare the difference in performance between different methods, i.e. check the performance gain from following his tip to use itertools.chain.
So I run following code
#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain
def using_concatenation():
one = range(1, 1000000)
two = range(1, 1000000)
three = one + two + one + two + one + two
def using_extend():
one = range(1, 1000000)
two = range(1, 1000000)
three = range(1, 1000000)
four = range(1, 1000000)
one.extend(two)
one.extend(three)
one.extend(four)
one.extend(two)
one.extend(three)
one.extend(four)
def using_chain():
one = range(1, 1000000)
two = range(1, 1000000)
three = chain(one, two, one, two, one, two)
if __name__ == '__main__':
repeat = 1000000
print('repeat {}'.format(repeat))
print('using concatenation --> {}'.format(timeit.timeit("using_concatenation", number=repeat, setup="from __main__ import using_concatenation")))
print('using extend --> {}'.format(timeit.timeit("using_extend", number=repeat, setup="from __main__ import using_extend")))
print('using chain --> {}'.format(timeit.timeit("using_chain", number=repeat, setup="from __main__ import using_chain"))) what I get in 2.7 is
Output: repeat 1000000
using concatenation --> 0.0159089565277
using extend --> 0.0156660079956
using chain --> 0.0156710147858
and in 3.5
Output: repeat 1000000
using concatenation --> 0.013655265793204308
using extend --> 0.013913063099607825
using chain --> 0.013966921018436551
even if there are differences between different runs, the result are consistent that there is no significant performance difference between the three routines.
Given who Raymond Hettinger is, I would guess I have problem with my script for performance measurement, rather than he is wrong. What you think?
Posts: 2,953
Threads: 48
Joined: Sep 2016
Oct-09-2017, 09:24 AM
(This post was last modified: Oct-09-2017, 09:24 AM by wavic.)
As I know, R. Hettinger is Python core developer so he knows the internal C code of each of those methods. Perhaps his tweet is based on that.
On what hardware, OS do you run this code?
Misstype on line 34: print('using concatenation --> {}'.format(timeit.timeit("using_concat ination", number=repeat, setup="from __main__ import using_concatenation")))
Here is my result on Arch linux:
repeat 1000000
using concatenation --> 0.010602696005662438
using extend --> 0.010252392006805167
using chain --> 0.009878639997623395
But this was the first try. Most of the time I get something like this:
repeat 1000000
using concatenation --> 0.010324251998099498
using extend --> 0.01023155700386269
using chain --> 0.010108978000062052
Posts: 8,154
Threads: 160
Joined: Sep 2016
Thanks for pointing the misspelling - I just noticed it in the editor and fixed it elsewhere. My results were from pythonanywhere, so linux. In windows I get the similar results.
I know that Raymond is python core developer, that is why I doubt my results, not his tip.
Posts: 2,122
Threads: 10
Joined: May 2017
Oct-09-2017, 11:47 AM
(This post was last modified: Oct-09-2017, 11:47 AM by DeaD_EyE.)
In Python3 the range function returns a range object, which is lazy evaluated.
When you run this code, it should raise an Exception, but this doesn't happen.
This is a sign, that your functions are not called by timeit.
This is the explanation, why all results are nearly the same.
You call a string in timeit. So the functions are never executed.
This will run on both versions:
#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain
def get_range():
return list(range(1, 1000000))
def using_concatenation():
one = get_range()
two = get_range()
three = get_range()
four = get_range()
one + two + three + four
def using_extend():
one = get_range()
two = get_range()
three = get_range()
four = get_range()
one.extend(two)
one.extend(three)
one.extend(four)
def using_chain():
one = get_range()
two = get_range()
three = get_range()
four = get_range()
list(chain(one, two, three, four))
# chain is lazy evaluated
# using list to consume the iterable
if __name__ == '__main__':
repeat = 10
print('repeat {}'.format(repeat))
print('using concatenation --> {}'.format(timeit.timeit(using_concatenation, number=repeat, setup="from __main__ import using_concatenation")))
print('using extend --> {}'.format(timeit.timeit(using_extend, number=repeat, setup="from __main__ import using_extend")))
print('using chain --> {}'.format(timeit.timeit(using_chain, number=repeat, setup="from __main__ import using_chain"))) Output: andre@andre-GP70-2PE:~$ python3 concat_vs_chain.py
repeat 10
using concatenation --> 1.8592366009997932
using extend --> 1.3227427910005645
using chain --> 1.5001900990009744
andre@andre-GP70-2PE:~$ python concat_vs_chain.py
repeat 10
using concatenation --> 1.72903013229
using extend --> 1.14736509323
using chain --> 1.36802816391
Posts: 8,154
Threads: 160
Joined: Sep 2016
OK, thanks for pointing out my mistake. It can be a string, but I miss the parenthesis
#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain
def using_concatenation():
one = range(1, 1000000)
two = range(1, 1000000)
three = one + two + one + two + one + two
def using_extend():
one = range(1, 1000000)
two = range(1, 1000000)
three = range(1, 1000000)
four = range(1, 1000000)
one.extend(two)
one.extend(three)
one.extend(four)
one.extend(two)
one.extend(three)
one.extend(four)
def using_chain():
one = range(1, 1000000)
two = range(1, 1000000)
three = chain(one, two, one, two, one, two)
if __name__ == '__main__':
repeat = 10
print('repeat {}'.format(repeat))
print('using concatenation --> {}'.format(timeit.timeit("using_concatenation()", number=repeat, setup="from __main__ import using_concatenation")))
print('using extend --> {}'.format(timeit.timeit("using_extend()", number=repeat, setup="from __main__ import using_extend")))
print('using chain --> {}'.format(timeit.timeit("using_chain()", number=repeat, setup="from __main__ import using_chain"))) win7, python2
Output: repeat 10
using concatenation --> 2.4562610593
using extend --> 1.63672606549
using chain --> 0.272854924754
will check also python3 at home
Posts: 4,220
Threads: 97
Joined: Sep 2016
That won't work in Python 3.x, because you can't add or extend ranges. It only works in 2.x because one and two are lists. I think that's also a large part of the performance gain: chain isn't creating a list.
Posts: 4,220
Threads: 97
Joined: Sep 2016
Replacing using_chain with:
def using_chain():
one = range(1, 1000000)
two = range(1, 1000000)
three = list(chain(one, two, one, two, one, two)) gets you:
Output: repeat 10
using concatenation --> 1.96962308884
using extend --> 1.10849690437
using chain --> 1.07177019119
So it works if you are trying to get something to iterate over, but not if you want something you can manipulate as a list.
Posts: 8,154
Threads: 160
Joined: Sep 2016
(Oct-09-2017, 03:21 PM)ichabod801 Wrote: That won't work in Python 3.x, because you can't add or extend ranges. It only works in 2.x because one and two are lists.
yes, that is a point that DeaD_EyE also made and he made a helper function to return a list.
working with list and on python3.5
#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain
def using_concatenation():
one = list(range(1, 1000000))
two = list(range(1, 1000000))
three = one + two + one + two + one + two
def using_extend():
one = list(range(1, 1000000))
two = list(range(1, 1000000))
three = list(range(1, 1000000))
four = list(range(1, 1000000))
one.extend(two)
one.extend(three)
one.extend(four)
one.extend(two)
one.extend(three)
one.extend(four)
def using_chain():
one = list(range(1, 1000000))
two = list(range(1, 1000000))
three = list(chain(one, two, one, two, one, two))
if __name__ == '__main__':
repeat = 10
print('repeat {}'.format(repeat))
print('using concatenation --> {}'.format(timeit.timeit("using_concatenation()", number=repeat, setup="from __main__ import using_concatenation")))
print('using extend --> {}'.format(timeit.timeit("using_extend()", number=repeat, setup="from __main__ import using_extend")))
print('using chain --> {}'.format(timeit.timeit("using_chain()", number=repeat, setup="from __main__ import using_chain"))) gives us
Output: repeat 10
using concatenation --> 3.375771024999267
using extend --> 2.0252870000003895
using chain --> 1.347193502000664
Posts: 3,458
Threads: 101
Joined: Sep 2016
It's a shame that the compiler doesn't notice that the functions don't effect global state, and return a constant "None", and thus can be rewritten as essentially lambda: None and giving O(1) performance :p
|