Python Forum
Lists: concatenate vs. extend vs. chain
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Lists: concatenate vs. extend vs. chain
#1
Seeing this tweet by Raymond Hettinger, I wanted to compare the difference in performance between different methods, i.e. check the performance gain from following his tip to use itertools.chain.

So I run following code

#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain


def using_concatenation():
one = range(1, 1000000)
two = range(1, 1000000)
three = one + two + one + two + one + two


def using_extend():
one = range(1, 1000000)
two = range(1, 1000000)
three = range(1, 1000000)
four = range(1, 1000000)
one.extend(two)
one.extend(three)
one.extend(four)
one.extend(two)
one.extend(three)
one.extend(four)


def using_chain():
one = range(1, 1000000)
two = range(1, 1000000)
three = chain(one, two, one, two, one, two)


if __name__ == '__main__':
repeat = 1000000
print('repeat {}'.format(repeat))
print('using concatenation --> {}'.format(timeit.timeit("using_concatenation", number=repeat, setup="from __main__ import using_concatenation")))
print('using extend --> {}'.format(timeit.timeit("using_extend", number=repeat, setup="from __main__ import using_extend")))
print('using chain --> {}'.format(timeit.timeit("using_chain", number=repeat, setup="from __main__ import using_chain")))
what I get in 2.7 is
Output:
repeat 1000000 using concatenation --> 0.0159089565277 using extend --> 0.0156660079956 using chain --> 0.0156710147858
and in 3.5
Output:
repeat 1000000 using concatenation --> 0.013655265793204308 using extend --> 0.013913063099607825 using chain --> 0.013966921018436551
even if there are differences between different runs, the result are consistent that there is no significant performance difference between the three routines.
Given who Raymond Hettinger is, I would guess I have problem with my script for performance measurement, rather than he is wrong. What you think?
Reply
#2
As I know, R. Hettinger is Python core developer so he knows the internal C code of each of those methods. Perhaps his tweet is based on that.
On what hardware, OS do you run this code?

Misstype on line 34: print('using concatenation --> {}'.format(timeit.timeit("using_concatination", number=repeat, setup="from __main__ import using_concatenation")))

Here is my result on Arch linux:
repeat 1000000
using concatenation --> 0.010602696005662438
using extend --> 0.010252392006805167
using chain --> 0.009878639997623395

But this was the first try. Most of the time I get something like this:
repeat 1000000
using concatenation --> 0.010324251998099498
using extend --> 0.01023155700386269
using chain --> 0.010108978000062052
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
Thanks for pointing the misspelling - I just noticed it in the editor and fixed it elsewhere. My results were from pythonanywhere, so linux. In windows I get the similar results.
I know that Raymond is python core developer, that is why I doubt my results, not his tip.
Reply
#4
In Python3 the range function returns a range object, which is lazy evaluated.
When you run this code, it should raise an Exception, but this doesn't happen.
This is a sign, that your functions are not called by timeit.

This is the explanation, why all results are nearly the same.
You call a string in timeit. So the functions are never executed.

This will run on both versions:

#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain
 
def get_range():
    return list(range(1, 1000000))
 
def using_concatenation():
    one = get_range()
    two = get_range()
    three = get_range()
    four = get_range()
    one + two + three + four
 
 
def using_extend():
    one = get_range()
    two = get_range()
    three = get_range()
    four = get_range()
    one.extend(two)
    one.extend(three)
    one.extend(four)
 
 
def using_chain():
    one = get_range()
    two = get_range()
    three = get_range()
    four = get_range()
    list(chain(one, two, three, four))
    # chain is lazy evaluated
    # using list to consume the iterable
 
if __name__ == '__main__':
    repeat = 10
    print('repeat {}'.format(repeat))
    print('using concatenation --> {}'.format(timeit.timeit(using_concatenation, number=repeat, setup="from __main__ import using_concatenation")))
    print('using extend --> {}'.format(timeit.timeit(using_extend, number=repeat, setup="from __main__ import using_extend")))
    print('using chain --> {}'.format(timeit.timeit(using_chain, number=repeat, setup="from __main__ import using_chain")))
Output:
andre@andre-GP70-2PE:~$ python3 concat_vs_chain.py repeat 10 using concatenation --> 1.8592366009997932 using extend --> 1.3227427910005645 using chain --> 1.5001900990009744 andre@andre-GP70-2PE:~$ python concat_vs_chain.py repeat 10 using concatenation --> 1.72903013229 using extend --> 1.14736509323 using chain --> 1.36802816391
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
OK, thanks for pointing out my mistake. It can be a string, but I miss the parenthesis
 
#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain

 
def using_concatenation():
    one = range(1, 1000000)
    two = range(1, 1000000)
    three = one + two + one + two + one + two
    
    
def using_extend():
    one = range(1, 1000000)
    two = range(1, 1000000)
    three = range(1, 1000000)
    four = range(1, 1000000)
    one.extend(two)
    one.extend(three)
    one.extend(four)
    one.extend(two)
    one.extend(three)
    one.extend(four)

    
def using_chain():
    one = range(1, 1000000)
    two = range(1, 1000000)
    three = chain(one, two, one, two, one, two)
    
 
if __name__ == '__main__':
   repeat = 10
   print('repeat {}'.format(repeat))
   print('using concatenation --> {}'.format(timeit.timeit("using_concatenation()", number=repeat, setup="from __main__ import using_concatenation")))
   print('using extend --> {}'.format(timeit.timeit("using_extend()", number=repeat, setup="from __main__ import using_extend")))
   print('using chain --> {}'.format(timeit.timeit("using_chain()", number=repeat, setup="from __main__ import using_chain")))
win7, python2
Output:
repeat 10 using concatenation --> 2.4562610593 using extend --> 1.63672606549 using chain --> 0.272854924754
will check also python3 at home
Reply
#6
That won't work in Python 3.x, because you can't add or extend ranges. It only works in 2.x because one and two are lists. I think that's also a large part of the performance gain: chain isn't creating a list.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#7
Replacing using_chain with:

def using_chain():
    one = range(1, 1000000)
    two = range(1, 1000000)
    three = list(chain(one, two, one, two, one, two))
gets you:

Output:
repeat 10 using concatenation --> 1.96962308884 using extend --> 1.10849690437 using chain --> 1.07177019119
So it works if you are trying to get something to iterate over, but not if you want something you can manipulate as a list.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#8
(Oct-09-2017, 03:21 PM)ichabod801 Wrote: That won't work in Python 3.x, because you can't add or extend ranges. It only works in 2.x because one and two are lists.

yes, that is a point that DeaD_EyE also made and he made a helper function to return a list.

working with list and on python3.5
 
#https://twitter.com/raymondh/status/916721150436057089
import timeit
from itertools import chain
 
  
def using_concatenation():
    one = list(range(1, 1000000))
    two = list(range(1, 1000000))
    three = one + two + one + two + one + two
     
     
def using_extend():
    one = list(range(1, 1000000))
    two = list(range(1, 1000000))
    three = list(range(1, 1000000))
    four = list(range(1, 1000000))
    one.extend(two)
    one.extend(three)
    one.extend(four)
    one.extend(two)
    one.extend(three)
    one.extend(four)
 
     
def using_chain():
    one = list(range(1, 1000000))
    two = list(range(1, 1000000))
    three = list(chain(one, two, one, two, one, two))
     
  
if __name__ == '__main__':
   repeat = 10
   print('repeat {}'.format(repeat))
   print('using concatenation --> {}'.format(timeit.timeit("using_concatenation()", number=repeat, setup="from __main__ import using_concatenation")))
   print('using extend --> {}'.format(timeit.timeit("using_extend()", number=repeat, setup="from __main__ import using_extend")))
   print('using chain --> {}'.format(timeit.timeit("using_chain()", number=repeat, setup="from __main__ import using_chain")))
gives us
Output:
repeat 10 using concatenation --> 3.375771024999267 using extend --> 2.0252870000003895 using chain --> 1.347193502000664
Reply
#9
It's a shame that the compiler doesn't notice that the functions don't effect global state, and return a constant "None", and thus can be rewritten as essentially lambda: None and giving O(1) performance :p
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020