Python Forum
Async / Await usage with asyncio to retrieve urls in parallel
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Async / Await usage with asyncio to retrieve urls in parallel
#1
I'm looking for transform the following code into parallel execution :

urls = ['http://example1.org', 'http://example2.org', '...']

def getResults(urls):
  results = {}
  for url in urls:
    results[url] = getResult(url)
  return results

def getResult(url):
  return get(url).json()
Here is what I've tried:

urls = ['http://example1.org', 'http://example2.org', '...']

def getResults(urls):
  return asyncio.gather((getResult(url)) for url in urls)

async def getResult(url):
    return await get(url).json()
Am I on the right track ? What's the correct way to use async/await with python 3?

Thanks
Reply
#2
1.) getResults function needs to be async function as well because you need to await asyncio.gather()
2.) asyncio.gather accepts individual coroutines as arguments not an iterable (specifically generator in your case) so you need to pass them separately
async def getResults(urls):
  return await asyncio.gather(*(getResult(url) for url in urls))
Also what is the get function you use in your getResult ? You need to make sure that you use asynchronous requests - for that you can use library like aiohttp -> https://docs.aiohttp.org/en/stable/clien...-a-request or httpx -> https://www.python-httpx.org/async/, otherwise it will all run synchronously

And last thing, you need to run the code, f.e. by adding this at the end:
if __name__ == '__main__':
    asyncio.run(getResults(urls))
Reply
#3
Thanks for your reply and sorry for the late response.
I have some questions:
- is it possible to make it such that all the asynchronous stuff occurs in the getResults() function, such that I don't need to call it with asyncio.run (in other words 'put' asyncio.run inside the getResults()) ?
- I wanted to return a dictionary of url -> result (see my synchronous version), how to achieve that ?
- I'm using the get from requests. How bad would it be not to use aiohttp ? Does the 'thread split' not occur before the get is called? In other words, I thought I was making a sync get but in a sort of separate thread, so I thought it wouldn't be significant. If I'm wrong, can you please give me the version with aiohttp ?

Thank you very much

It doesn't seem I can edit so let me add another post to clarify the first question of my previous post:
- I want to have the same 'API' as before, which is just getResults() without having to change the rest of the code.
- However this time I want getResults() to run the http requests in parallel, and then wait for all the requests to be finished and return them as a dictionary of results.
Reply
#4
1.) Yes it's possible, you can just call the asyncio.run in getResults but you will have to move the asyncio.gather to a new coroutine

2.) Let's say your new getResult function will look something like this:
async def get_url_result(url):
    response = await asynchronous_get_request(url)  # not an actual working code
    return response
then response will be a some kind of Response object which will have url attribute and the actual response that you can then use to update your dictionary with

3.) requests is a blocking library, which would make the whole point of using asyncio in your case pointless. There is actually no "thread split", asyncio runs only in one thread and it runs concurrently and not in parallel. What happens when you use non-blocking call in this case, is that when a request is made, while it waits for response, it gives back control to the event loop, so other coroutines that are scheduled on it, can be run. There is a nice explanation on real python -> https://realpython.com/async-io-python/

Regarding an example for aiohttp, there is a really simple one in the link I sent you:
async with aiohttp.ClientSession() as session:
    async with session.get('http://httpbin.org/get') as resp:
        print(resp.status)
        print(await resp.text())
It would be good if you'll try to do it on your own, post your experiments and I'll help
Reply
#5
I sure would love to post my experiments, but I'm really struggling to put it all together. I don't even know where to start. What I want seems fairly simple (just by looking at the sync version), and I'm new to python so I'm really in pain to make all of this work. In the end I just want a async getResults() function that can be called from elsewhere just as if it were a sync getResults().

From what I understand of your answers, I have no choice but to use the aiohttp. So the other pieces of code are irrelevant.

In the last example you're giving, it just prints the result, whereas I want to return it as a dictionary url -> response, and it doesn't handle a loop.

Here is what I've tried, but I don't know how wrong I am :

async def getResultsAsync(urls):
    async def getResultAsync(url):
        async with aiohttp.ClientSession() as session:
            resp = await session.get(url)
            res = url, await resp.json()
        return res
    return dict(await asyncio.gather(*[getResultAsync(url) for url in urls]))

def getResults(urls):
    return asyncio.run(getResultsAsync(url))
Reply
#6
Just a few small changes. I would personally leave the collection of all the results in the get_results function:
def get_results(urls):

    async def gather_coro():
        return await asyncio.gather(*(get_url_result(url) for url in urls))

    url_results = asyncio.run(gather_coro())
    return dict(url_results)
And then acquiring the single result to get_url_result coroutine:

async def get_url_result(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return resp.url, await resp.text()
Reply
#7
Okay thanks, it seems I'm getting somewhere !

Two more questions:
1. The session is not reused in between the requests with that code, right ? Would it make the code much more complicated to make that so ?
2. My urls actually are actually composed of two parts, one that is static and another that is the search term, and needs to be url-escaped. And I want the search term to be the key of the dictionary result, not the whole url. So in other words, I can't just return resp.url, await resp.text(). I would instead like to just return resp and create the dictionary elsewhere. Is that possible? If so, how ?
Thx
Reply
#8
1.) yes good point, you definitely don't want to create a session per request, so in this case you can just move the context manager in the gather_coro and pass it as a argument to get_url_result function:
def get_results(urls):

  async def gather_coro():
    async with aiohttp.ClientSession() as session:   # moved here from get_url_result
      return await asyncio.gather(*(get_url_result(url, session) for url in urls))

  url_results = asyncio.run(gather_coro())
  return dict(url_results)
2.) Yes, get_url_result can return just resp, creation of dictionary really shouldn't be a problem! But remember that in order to access response content like resp.text() you need to await it, this means you won't be able to access it from a normal function but only from coroutine, from docs -
Quote:aiohttp loads only the headers when .get() is executed, letting you decide to pay the cost of loading the body afterward, in a second asynchronous operation. Hence the await response.text().
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  queue for async function python telegram.ext noctious 0 1,512 Jun-11-2023, 02:58 PM
Last Post: noctious
  get data from 2 async functions korenron 0 1,202 Sep-22-2021, 08:39 AM
Last Post: korenron
  Async requests lukee 0 1,487 Oct-06-2020, 04:40 AM
Last Post: lukee
  Using Python to search through a list of urls jeremy 4 2,813 Dec-18-2019, 11:52 AM
Last Post: Malt
  Async IO writing to two Different Tables Help TiagoV 0 2,604 Oct-09-2019, 04:45 AM
Last Post: TiagoV
  Urls in a file to be executed pyseeker 2 2,011 Sep-09-2019, 03:38 PM
Last Post: pyseeker
  user validation for opening urls Ashley 6 2,654 Jul-08-2019, 09:08 PM
Last Post: metulburr
  Issues with async and yielding from API GSerum 1 2,105 Dec-18-2018, 08:37 PM
Last Post: nilamo
  async question on raspberry pi baukeplugge 2 47,833 Nov-07-2018, 07:58 PM
Last Post: baukeplugge
  How I can limit quantity of parallel executable tasks in asyncio? AlekseyPython 1 2,396 Oct-24-2018, 10:22 AM
Last Post: AlekseyPython

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020