Python Forum
Exploring async/await, without knowing how they work ahead of time
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Exploring async/await, without knowing how they work ahead of time
#1
Ever since these wacky things called coroutines popped up in python, I've been curious why they'd be useful, as well as how you'd actually go about using them.  So let's investigate!

The docs talk about using them with respect to something called an "event loop", but that's nonsense.  They're just python objects, and we'll use them as such, instead of pretending they're special things that must be used in certain ways, or like magical faeries which only work if you provide the right incantations.

So before we can get started, let's lay down some ground rules.  I am ONLY interested in exploring how the async / await keywords are used, which were introduced in 3.5.  Before that, the same functionality was simulated using generators.
Furthermore, this is written in an almost stream-of-consciousness sort of way.  If nothing else, a secondary aim of this is to help show someone how to figure things out for yourself, when you have no idea what you're doing :p

D:\Projects\playground>python
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Alright, so we've got a version of python that supports the syntax.  Now let's try it out and see what we get:
>>> async def test1():
...   print("inside test1")
...   return 5
...
>>> test1()
<coroutine object test1 at 0x01957CF0>
>>>
Calling it like a function returns an object that isn't executed right away.  Weird, but alright.  What can we do with that object?
>>> x = test1()
>>> dir(x)
['__await__', '__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'cr_await', 'cr_code', 'cr_frame', 'cr_running', 'send', 'throw']
send?  That's the same method that'd be used for continuing a generator.  Coroutines are supposed to share a lot of the syntax and usage of generators, so that's probably a good place to dig into.

>>> help(x.send)
Help on built-in function send:

send(...) method of builtins.coroutine instance
    send(arg) -> send 'arg' into coroutine,
    return next iterated value or raise StopIteration.
Not... super helpful.  Let's just try it and see what happens.
>>> x.send()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: send() takes exactly one argument (0 given)
Again, not super helpful.  Let's try... passing it something?  We never defined any input for the coroutine, so what exactly one argument it's expecting is beyond me.  Let's take a pythonic stab in the dark:
>>> x.send("spam")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't send non-None value to a just-started coroutine
So we have to pass one argument, and it HAS to be None.  That's weird, and I have no idea why.  I'll guess it's a quirk of using the same interface as generators, and just try it:
>>> x.send(None)
inside test1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: 5
Ok, so more weirdness.  We see our print function was called, so the coroutine was finally actually executed.  But what we tried to return was instead raised as an error.
This is due to the generator interface again.  Despite trying harder than I thought I would, I still have no idea how to use the yield keyword inside a coroutine.  Let's just say, for argument's sake, that you start with this:
>>> async def test2(val):
...   print("inside test2")
...   other = yield
...   print("after first yield")
...   yield val * other
...   print("end of test2")
...
x = test2(5) => x is an <async_generator object test2 at 0x0198D230>.
async_generators don't have send methods, but they do have asend methods.  ...which return <async_generator_asend object at 0x036DAB70> async_generator_asend objects.  Which DO have send methods.  ...which don't do anything meaningful?
>>> x = test2(3)
>>> y = x.asend(4)
>>> y.send()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: send() takes exactly one argument (0 given)
>>> y.send(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't send non-None value to a just-started async generator
>>> y.send(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
At the end, you can see StopIteration, which is what we got with the non-yielding coroutine.  So let's try return instead of yield...
>>> async def test3():
...   print("inside test3")
...   val = yield
...   yield val ** 2
...   return 4 # random, chosen by fair dice roll
...
  File "<stdin>", line 5
SyntaxError: 'return' with value in async generator
So we can't get values, we can't return values, and we can't use return if it's an async generator.  I honestly don't know what else to try with this.

So let's move on.  Instead of yield, let's go back to just return since that was working earlier.  As a refresher:
>>> async def test4(val):
...   print("inside test4")
...   return val ** 2
...
>>> x = test4(5)
>>> try:
...   x.send(None)
... except StopIteration as ret_val:
...   pow = ret_val
...   print(pow)
...
inside test4
25
Rad.  Let's make testing a little easier for ourselves, and write a simple function that can consume an async function:
>>> def consume(coroutine):
...   try:
...     coroutine.send(None)
...   except StopIteration as ret:
...     return ret
...
>>> x = test4(9)
>>> y = consume(x)
inside test4
>>> z = consume(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in consume
RuntimeError: cannot reuse already awaited coroutine
Interesting.  So coroutines are single-use.  Like a stream.  ...or a generator.
>>> y
StopIteration(81,)
>>> print(y)
81
>>> type(y)
<class 'StopIteration'>
Oh, but our consumer doesn't actually return the return value... it returns a wrapper class.  That's odd.  Let's fix that!
>>> type(y)
<class 'StopIteration'>
>>> dir(y)
['__cause__', '__class__', '__context__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__suppress_context__', '__traceback__', 'args', 'value', 'with_traceback']
>>> y.value
81
>>> def consume(coroutine):
...   try:
...     coroutine.send(None)
...   except StopIteration as ret:
...     return ret.value
...
>>> x = test4(3)
>>> y = consume(x)
inside test4
>>> y
9
>>> type(y)
<class 'int'>
Boom, now we're getting somewhere.  We've defined our own coroutine, and can actually call it.  But... why would we use it?  So far, the only legitimate use I can see would be as an interface to other coroutines that DO actually do things (since nothing happens until you start it).
BUT FEAR NOT!  Like most things in python, the function is just nice syntactic sugar over an object.  So let's try THAT out...
>>> class Remote:
...   def __await__(self):
...     return 'spam'
...
>>> async def test():
...   fut = Remote()
...   val = await fut
...   return val[::-1]
...
>>> x = test()
>>> consume(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in consume
  File "<stdin>", line 3, in test
TypeError: __await__() returned non-iterator of type 'str'
Ok, not quite...
>>> class Remote:
...   def __await__(self):
...     yield 'spam'
...
>>> x = test()
>>> consume(x)
>>> x = test()
>>> x.send(None)
'spam'
>>> x.send(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in test
TypeError: 'NoneType' object is not subscriptable
So yielding from an inner await... returns the yielded value?  But only to the caller, not to the coroutine that actually awaited it?  That feels really wrong, so let's keep digging.
>>> class Remote:
...   def __await__(self):
...     yield # shh, don't worry about it
...     return 5
...
>>> async def test():
...   future = Remote()
...   val = await future
...   return val ** 2
...
>>> x = test()
>>> x.send(None)
>>> x.send(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: 25
BOOM!  Now we're getting somewhere.  But calling send over and over if there's awaiting going on in our coroutine is... insane.  So let's rewrite our consume function to be a little more useful:
>>> def consume(coroutine):
...   try:
...     while True:
...       coroutine.send(None)
...   except StopIteration as val:
...     return val.value
...
>>> x = test()
>>> consume(x)
25
Ok, so we can use async functions, we can use the value of async functions from within async functions, and we know how to use classes as async objects.  Just to make sure we understand, this is also something we can do:
>>> def long_running_task():
...   print("starting the task!")
...   async def get_result():
...     print("i'll get there eventually")
...     return 7
...   return get_result()
...
>>> long_running_task()
starting the task!
<coroutine object long_running_task.<locals>.get_result at 0x037E1F90>
>>> # excellent
...
>>> async def test():
...   print("in test")
...   fut = long_running_task()
...   print("before await")
...   value = await fut
...   print("received: {0}".format(value))
...   return value / 2
...
>>> x = test()
>>> consume(x)
in test
starting the task!
before await
i'll get there eventually
received: 7
3.5
So now, not only can we use async functions, but we finally can run some setup code before using await (remember from above, normally a coroutine will do absolutely nothing until you call await on it).
I feel like the next step, to make sure we acutally understand what we're doing, is to wrap something up in a coroutine, thread it, and make sure we get the same results as without threads/coroutines.  And then time it.

I hate my router, so I'll wrap up requests for this.  :)
I also... don't want to use the interactive prompt anymore.  Sorry but not sorry.
Furthermore, this is a stupid test because...
 - requests might be doing some caching behind the scenes
 - this is completely dependent upon network traffic, so if one of the sites is temporarily slow, it makes one of the methods look slow
 - the size of a website's contents change between requests, so comparing the sizes is only useful as evidence that content was actually received, not that it was the "same" content
That said, I did at least try to mitigate the "warm-up" bias by making the async part run first :)

import requests
import threading

pages = ["http://google.com", "http://python.org", "http://amazon.com", "http://python-forum.io"]

# "dumb" method of doing it.
# the baseline we're comparing against
def get_total_size():
    size = 0
    for page in pages:
        data = requests.get(page)
        size += len(data.text)
    return size


class Request:
    def __init__(self, url):
        self.url = url
        self.content = None
        self.lock = threading.Lock()
        self.thread = threading.Thread(target=self._callback)
        self.thread.start()

    def _callback(self):
        page = requests.get(self.url)
        with self.lock:
            self.content = page.text

    def __await__(self):
        yield
        self.thread.join()
        with self.lock:
            return self.content

async def get_total_size_async():
    # start all the requests, so they all can eat network at the same time
    futures = [Request(page) for page in pages]
    size = 0
    for fut in futures:
        size += len(await fut)
    return size

def runner(coro):
    try:
        while True:
            coro.send(None)
    except StopIteration as val:
        return val.value

def get_total_size_base():
    return runner(get_total_size_async())

if __name__ == "__main__":
    import timeit

    print("Sync return value: {0}".format(get_total_size()))
    print("Async return value: {0}".format(get_total_size_base()))

    runs = 15
    print("Starting timing run...")
    async_time = timeit.Timer(stmt="get_total_size_base()", globals=globals()).timeit(number=runs)
    print("Async fetching finished in {0}".format(async_time))
    sync_time = timeit.Timer(stmt="get_total_size()", globals=globals()).timeit(number=runs)
    print("Sync fetching finished in {0}".format(sync_time))
Output:
D:\Projects\playground>python coro_test.py Sync return value: 95209 Async return value: 95250 Starting timing run... Async fetching finished in 17.625186074221094 Sync fetching finished in 23.469298543643294
And now let's finish by talking about what I assume the actual usefulness of the asyncio.event_loops is.  Remember, we were running our coroutines via a while True loop.  If, instead, we had more than one coroutine, we could have run the others.  That way, if one of them was awaiting something that'd take a while, like io/networking/database query/etc, something unrelated could be running.  That way most of your code is single threaded, with little worker threads running in the background, that you never even need to know about, since it's hidden away from you with the await syntax.  In that case, the "real" event loop would be very similar to our consume function, except that you'd "register" several coroutines, and then start running them (probably until all of them were done).  Which is cool, and suspiciously similar to how Twisted has worked for a very, very long time :p

Now for some final thoughts.  I think this syntax is nice.  It's clean, and it makes it obvious what's going on (await basically reads as "something else can take over the main thread, I'm not going to do anything until that over there is done.").  The downside, is that it is definitely not obvious how to actually get started using these things... but that's probably because the interface has changed so much in a very short amount of time, so the docs haven't quite caught up with it.  Another downside (imo) is how very... uncomfortable... using a class is for this.  I still have no idea why you *have* to have a yield statement who's yielded value isn't used for anything, and it makes the class look like there's dark magic happening.  Which is weird, because the __await__ method is brand new as of 3.5, so there shouldn't be a whole lot of baggage around it.


Anyway, that was a fun/frustrating couple of hours.  Hopefully someone else can divine usefulness out of it.  Personally, I'm much happier with my understanding of coroutines now... even if I have no idea how they work under the surface.
Reply
#2
FYI:

David Beazley has a couple of video's on coroutines:
The first at PyCon 2009: http://www.dabeaz.com/coroutines/index.html
Definitely worth watching

The second at PyOhio last year: https://www.youtube.com/watch?v=E-1Y4kSsAFc
I haven't watched this one yet
Reply
#3
I've tried. I view video as one of the worst mediums for disseminating information.
Reply
#4
A lot of them aren't worthy of the time spent watching.
I always like watching Beazley, he reminds me of the guys I used to work with,
and he's quite entertaining as well (in a nerdy sort of way)
He usually has a slide set that accompanies his videos on his website http://www.dabeaz.com/
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Async API request and NTLM authentication himanshuj89 0 1,299 Nov-05-2022, 02:06 PM
Last Post: himanshuj89
  Why does await only work in async? n4te99 0 1,572 Jan-19-2021, 09:46 AM
Last Post: n4te99
  Transcrypt Python to JavaScript compiler now supports async/await jacques_de_hooge 4 4,594 Aug-24-2017, 04:21 PM
Last Post: snippsat
  PEP 492 - Coroutines [async/await] nilamo 4 6,098 Oct-05-2016, 06:05 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020