May-06-2017, 10:21 PM
Ever since these wacky things called coroutines popped up in python, I've been curious why they'd be useful, as well as how you'd actually go about using them. So let's investigate!
The docs talk about using them with respect to something called an "event loop", but that's nonsense. They're just python objects, and we'll use them as such, instead of pretending they're special things that must be used in certain ways, or like magical faeries which only work if you provide the right incantations.
So before we can get started, let's lay down some ground rules. I am ONLY interested in exploring how the
Furthermore, this is written in an almost stream-of-consciousness sort of way. If nothing else, a secondary aim of this is to help show someone how to figure things out for yourself, when you have no idea what you're doing :p
Alright, so we've got a version of python that supports the syntax. Now let's try it out and see what we get:
Calling it like a function returns an object that isn't executed right away. Weird, but alright. What can we do with that object?
Not... super helpful. Let's just try it and see what happens.
Again, not super helpful. Let's try... passing it something? We never defined any input for the coroutine, so what
So we have to pass one argument, and it HAS to be
Ok, so more weirdness. We see our print function was called, so the coroutine was finally actually executed. But what we tried to return was instead raised as an error.
This is due to the generator interface again. Despite trying harder than I thought I would, I still have no idea how to use the
async_generators don't have
At the end, you can see StopIteration, which is what we got with the non-yielding coroutine. So let's try
So we can't get values, we can't return values, and we can't use
So let's move on. Instead of
Rad. Let's make testing a little easier for ourselves, and write a simple function that can consume an async function:
Interesting. So coroutines are single-use. Like a stream. ...or a generator.
Oh, but our consumer doesn't actually return the return value... it returns a wrapper class. That's odd. Let's fix that!
Boom, now we're getting somewhere. We've defined our own coroutine, and can actually call it. But... why would we use it? So far, the only legitimate use I can see would be as an interface to other coroutines that DO actually do things (since nothing happens until you start it).
BUT FEAR NOT! Like most things in python, the function is just nice syntactic sugar over an object. So let's try THAT out...
Ok, not quite...
So yielding from an inner await... returns the yielded value? But only to the caller, not to the coroutine that actually awaited it? That feels really wrong, so let's keep digging.
BOOM! Now we're getting somewhere. But calling send over and over if there's awaiting going on in our coroutine is... insane. So let's rewrite our
Ok, so we can use async functions, we can use the value of async functions from within async functions, and we know how to use classes as async objects. Just to make sure we understand, this is also something we can do:
So now, not only can we use async functions, but we finally can run some setup code before using await (remember from above, normally a coroutine will do absolutely nothing until you call await on it).
I feel like the next step, to make sure we acutally understand what we're doing, is to wrap something up in a coroutine, thread it, and make sure we get the same results as without threads/coroutines. And then time it.
I hate my router, so I'll wrap up
I also... don't want to use the interactive prompt anymore. Sorry but not sorry.
Furthermore, this is a stupid test because...
-
- this is completely dependent upon network traffic, so if one of the sites is temporarily slow, it makes one of the methods look slow
- the size of a website's contents change between requests, so comparing the sizes is only useful as evidence that content was actually received, not that it was the "same" content
That said, I did at least try to mitigate the "warm-up" bias by making the async part run first :)
Now for some final thoughts. I think this syntax is nice. It's clean, and it makes it obvious what's going on (
Anyway, that was a fun/frustrating couple of hours. Hopefully someone else can divine usefulness out of it. Personally, I'm much happier with my understanding of coroutines now... even if I have no idea how they work under the surface.
The docs talk about using them with respect to something called an "event loop", but that's nonsense. They're just python objects, and we'll use them as such, instead of pretending they're special things that must be used in certain ways, or like magical faeries which only work if you provide the right incantations.
So before we can get started, let's lay down some ground rules. I am ONLY interested in exploring how the
async / await
keywords are used, which were introduced in 3.5. Before that, the same functionality was simulated using generators.Furthermore, this is written in an almost stream-of-consciousness sort of way. If nothing else, a secondary aim of this is to help show someone how to figure things out for yourself, when you have no idea what you're doing :p
1 2 3 4 |
D:\Projects\playground>python Python 3.6 . 0 (v3. 6.0 : 41df79263a11 , Dec 23 2016 , 07 : 18 : 10 ) [MSC v. 1900 32 bit (Intel)] on win32 Type "help" , "copyright" , "credits" or "license" for more information. >>> |
1 2 3 4 5 6 7 |
>>> async def test1(): ... print ( "inside test1" ) ... return 5 ... >>> test1() <coroutine object test1 at 0x01957CF0 > >>> |
1 2 3 |
>>> x = test1() >>> dir (x) [ '__await__' , '__class__' , '__del__' , '__delattr__' , '__dir__' , '__doc__' , '__eq__' , '__format__' , '__ge__' , '__getattribute__' , '__gt__' , '__hash__' , '__init__' , '__init_subclass__' , '__le__' , '__lt__' , '__name__' , '__ne__' , '__new__' , '__qualname__' , '__reduce__' , '__reduce_ex__' , '__repr__' , '__setattr__' , '__sizeof__' , '__str__' , '__subclasshook__' , 'close' , 'cr_await' , 'cr_code' , 'cr_frame' , 'cr_running' , 'send' , 'throw' ] |
send
? That's the same method that'd be used for continuing a generator. Coroutines are supposed to share a lot of the syntax and usage of generators, so that's probably a good place to dig into.1 2 3 4 5 6 |
>>> help (x.send) Help on built - in function send: send(...) method of builtins.coroutine instance send(arg) - > send 'arg' into coroutine, return next iterated value or raise StopIteration. |
1 2 3 4 |
>>> x.send() Traceback (most recent call last): File "<stdin>" , line 1 , in <module> TypeError: send() takes exactly one argument ( 0 given) |
exactly one argument
it's expecting is beyond me. Let's take a pythonic stab in the dark:1 2 3 4 |
>>> x.send( "spam" ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> TypeError: can't send non - None value to a just - started coroutine |
None
. That's weird, and I have no idea why. I'll guess it's a quirk of using the same interface as generators, and just try it:1 2 3 4 5 |
>>> x.send( None ) inside test1 Traceback (most recent call last): File "<stdin>" , line 1 , in <module> StopIteration: 5 |
This is due to the generator interface again. Despite trying harder than I thought I would, I still have no idea how to use the
yield
keyword inside a coroutine. Let's just say, for argument's sake, that you start with this:1 2 3 4 5 6 7 |
>>> async def test2(val): ... print ( "inside test2" ) ... other = yield ... print ( "after first yield" ) ... yield val * other ... print ( "end of test2" ) ... |
x = test2(5)
=> x is an <async_generator object test2 at 0x0198D230>
.async_generators don't have
send
methods, but they do have asend
methods. ...which return <async_generator_asend object at 0x036DAB70>
async_generator_asend objects. Which DO have send
methods. ...which don't do anything meaningful?1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> x = test2( 3 ) >>> y = x.asend( 4 ) >>> y.send() Traceback (most recent call last): File "<stdin>" , line 1 , in <module> TypeError: send() takes exactly one argument ( 0 given) >>> y.send( 5 ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> TypeError: can't send non - None value to a just - started async generator >>> y.send( None ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> StopIteration |
return
instead of yield
...1 2 3 4 5 6 7 8 |
>>> async def test3(): ... print ( "inside test3" ) ... val = yield ... yield val * * 2 ... return 4 # random, chosen by fair dice roll ... File "<stdin>" , line 5 SyntaxError: 'return' with value in async generator |
return
if it's an async generator. I honestly don't know what else to try with this.So let's move on. Instead of
yield
, let's go back to just return
since that was working earlier. As a refresher:1 2 3 4 5 6 7 8 9 10 11 12 13 |
>>> async def test4(val): ... print ( "inside test4" ) ... return val * * 2 ... >>> x = test4( 5 ) >>> try : ... x.send( None ) ... except StopIteration as ret_val: ... pow = ret_val ... print ( pow ) ... inside test4 25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> def consume(coroutine): ... try : ... coroutine.send( None ) ... except StopIteration as ret: ... return ret ... >>> x = test4( 9 ) >>> y = consume(x) inside test4 >>> z = consume(x) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> File "<stdin>" , line 3 , in consume RuntimeError: cannot reuse already awaited coroutine |
1 2 3 4 5 6 |
>>> y StopIteration( 81 ,) >>> print (y) 81 >>> type (y) < class 'StopIteration' > |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
>>> type (y) < class 'StopIteration' > >>> dir (y) [ '__cause__' , '__class__' , '__context__' , '__delattr__' , '__dict__' , '__dir__' , '__doc__' , '__eq__' , '__format__' , '__ge__' , '__getattribute__' , '__gt__' , '__hash__' , '__init__' , '__init_subclass__' , '__le__' , '__lt__' , '__ne__' , '__new__' , '__reduce__' , '__reduce_ex__' , '__repr__' , '__setattr__' , '__setstate__' , '__sizeof__' , '__str__' , '__subclasshook__' , '__suppress_context__' , '__traceback__' , 'args' , 'value' , 'with_traceback' ] >>> y.value 81 >>> def consume(coroutine): ... try : ... coroutine.send( None ) ... except StopIteration as ret: ... return ret.value ... >>> x = test4( 3 ) >>> y = consume(x) inside test4 >>> y 9 >>> type (y) < class 'int' > |
BUT FEAR NOT! Like most things in python, the function is just nice syntactic sugar over an object. So let's try THAT out...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
>>> class Remote: ... def __await__( self ): ... return 'spam' ... >>> async def test(): ... fut = Remote() ... val = await fut ... return val[:: - 1 ] ... >>> x = test() >>> consume(x) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> File "<stdin>" , line 3 , in consume File "<stdin>" , line 3 , in test TypeError: __await__() returned non - iterator of type 'str' |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> class Remote: ... def __await__( self ): ... yield 'spam' ... >>> x = test() >>> consume(x) >>> x = test() >>> x.send( None ) 'spam' >>> x.send( None ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> File "<stdin>" , line 4 , in test TypeError: 'NoneType' object is not subscriptable |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
>>> class Remote: ... def __await__( self ): ... yield # shh, don't worry about it ... return 5 ... >>> async def test(): ... future = Remote() ... val = await future ... return val * * 2 ... >>> x = test() >>> x.send( None ) >>> x.send( None ) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> StopIteration: 25 |
consume
function to be a little more useful:1 2 3 4 5 6 7 8 9 10 |
>>> def consume(coroutine): ... try : ... while True : ... coroutine.send( None ) ... except StopIteration as val: ... return val.value ... >>> x = test() >>> consume(x) 25 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
>>> def long_running_task(): ... print ( "starting the task!" ) ... async def get_result(): ... print ( "i'll get there eventually" ) ... return 7 ... return get_result() ... >>> long_running_task() starting the task! <coroutine object long_running_task.< locals >.get_result at 0x037E1F90 > >>> # excellent ... >>> async def test(): ... print ( "in test" ) ... fut = long_running_task() ... print ( "before await" ) ... value = await fut ... print ( "received: {0}" . format (value)) ... return value / 2 ... >>> x = test() >>> consume(x) in test starting the task! before await i'll get there eventually received: 7 3.5 |
I feel like the next step, to make sure we acutally understand what we're doing, is to wrap something up in a coroutine, thread it, and make sure we get the same results as without threads/coroutines. And then time it.
I hate my router, so I'll wrap up
requests
for this. :)I also... don't want to use the interactive prompt anymore. Sorry but not sorry.
Furthermore, this is a stupid test because...
-
requests
might be doing some caching behind the scenes- this is completely dependent upon network traffic, so if one of the sites is temporarily slow, it makes one of the methods look slow
- the size of a website's contents change between requests, so comparing the sizes is only useful as evidence that content was actually received, not that it was the "same" content
That said, I did at least try to mitigate the "warm-up" bias by making the async part run first :)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
import requests import threading # "dumb" method of doing it. # the baseline we're comparing against def get_total_size(): size = 0 for page in pages: data = requests.get(page) size + = len (data.text) return size class Request: def __init__( self , url): self .url = url self .content = None self .lock = threading.Lock() self .thread = threading.Thread(target = self ._callback) self .thread.start() def _callback( self ): page = requests.get( self .url) with self .lock: self .content = page.text def __await__( self ): yield self .thread.join() with self .lock: return self .content async def get_total_size_async(): # start all the requests, so they all can eat network at the same time futures = [Request(page) for page in pages] size = 0 for fut in futures: size + = len ( await fut) return size def runner(coro): try : while True : coro.send( None ) except StopIteration as val: return val.value def get_total_size_base(): return runner(get_total_size_async()) if __name__ = = "__main__" : import timeit print ( "Sync return value: {0}" . format (get_total_size())) print ( "Async return value: {0}" . format (get_total_size_base())) runs = 15 print ( "Starting timing run..." ) async_time = timeit.Timer(stmt = "get_total_size_base()" , globals = globals ()).timeit(number = runs) print ( "Async fetching finished in {0}" . format (async_time)) sync_time = timeit.Timer(stmt = "get_total_size()" , globals = globals ()).timeit(number = runs) print ( "Sync fetching finished in {0}" . format (sync_time)) |
Output:D:\Projects\playground>python coro_test.py
Sync return value: 95209
Async return value: 95250
Starting timing run...
Async fetching finished in 17.625186074221094
Sync fetching finished in 23.469298543643294
And now let's finish by talking about what I assume the actual usefulness of the asyncio.event_loops is. Remember, we were running our coroutines via a while True
loop. If, instead, we had more than one coroutine, we could have run the others. That way, if one of them was await
ing something that'd take a while, like io/networking/database query/etc, something unrelated could be running. That way most of your code is single threaded, with little worker threads running in the background, that you never even need to know about, since it's hidden away from you with the await syntax. In that case, the "real" event loop would be very similar to our consume
function, except that you'd "register" several coroutines, and then start running them (probably until all of them were done). Which is cool, and suspiciously similar to how Twisted has worked for a very, very long time :pNow for some final thoughts. I think this syntax is nice. It's clean, and it makes it obvious what's going on (
await
basically reads as "something else can take over the main thread, I'm not going to do anything until that over there is done."). The downside, is that it is definitely not obvious how to actually get started using these things... but that's probably because the interface has changed so much in a very short amount of time, so the docs haven't quite caught up with it. Another downside (imo) is how very... uncomfortable... using a class is for this. I still have no idea why you *have* to have a yield statement who's yielded value isn't used for anything, and it makes the class look like there's dark magic happening. Which is weird, because the __await__ method is brand new as of 3.5, so there shouldn't be a whole lot of baggage around it.Anyway, that was a fun/frustrating couple of hours. Hopefully someone else can divine usefulness out of it. Personally, I'm much happier with my understanding of coroutines now... even if I have no idea how they work under the surface.