Python Forum

I'm looking for transform the following code into parallel execution :

urls = ['http://example1.org', 'http://example2.org', '...']

def getResults(urls):
  results = {}
  for url in urls:
    results[url] = getResult(url)
  return results

def getResult(url):
  return get(url).json()

Here is what I've tried:

urls = ['http://example1.org', 'http://example2.org', '...']

def getResults(urls):
  return asyncio.gather((getResult(url)) for url in urls)

async def getResult(url):
    return await get(url).json()

Am I on the right track ? What's the correct way to use async/await with python 3?

Thanks

1.) getResults function needs to be async function as well because you need to await asyncio.gather()
2.) asyncio.gather accepts individual coroutines as arguments not an iterable (specifically generator in your case) so you need to pass them separately

async def getResults(urls):
  return await asyncio.gather(*(getResult(url) for url in urls))

Also what is the get function you use in your getResult ? You need to make sure that you use asynchronous requests - for that you can use library like aiohttp -> https://docs.aiohttp.org/en/stable/clien...-a-request or httpx -> https://www.python-httpx.org/async/, otherwise it will all run synchronously

And last thing, you need to run the code, f.e. by adding this at the end:

if __name__ == '__main__':
    asyncio.run(getResults(urls))

Thanks for your reply and sorry for the late response.
I have some questions:
- is it possible to make it such that all the asynchronous stuff occurs in the getResults() function, such that I don't need to call it with asyncio.run (in other words 'put' asyncio.run inside the getResults()) ?
- I wanted to return a dictionary of url -> result (see my synchronous version), how to achieve that ?
- I'm using the get from requests. How bad would it be not to use aiohttp ? Does the 'thread split' not occur before the get is called? In other words, I thought I was making a sync get but in a sort of separate thread, so I thought it wouldn't be significant. If I'm wrong, can you please give me the version with aiohttp ?

Thank you very much

It doesn't seem I can edit so let me add another post to clarify the first question of my previous post:
- I want to have the same 'API' as before, which is just getResults() without having to change the rest of the code.
- However this time I want getResults() to run the http requests in parallel, and then wait for all the requests to be finished and return them as a dictionary of results.

1.) Yes it's possible, you can just call the asyncio.run in getResults but you will have to move the asyncio.gather to a new coroutine

2.) Let's say your new getResult function will look something like this:

async def get_url_result(url):
    response = await asynchronous_get_request(url)  # not an actual working code
    return response

then response will be a some kind of Response object which will have url attribute and the actual response that you can then use to update your dictionary with

3.) requests is a blocking library, which would make the whole point of using asyncio in your case pointless. There is actually no "thread split", asyncio runs only in one thread and it runs concurrently and not in parallel. What happens when you use non-blocking call in this case, is that when a request is made, while it waits for response, it gives back control to the event loop, so other coroutines that are scheduled on it, can be run. There is a nice explanation on real python -> https://realpython.com/async-io-python/

Regarding an example for aiohttp, there is a really simple one in the link I sent you:

async with aiohttp.ClientSession() as session:
    async with session.get('http://httpbin.org/get') as resp:
        print(resp.status)
        print(await resp.text())

It would be good if you'll try to do it on your own, post your experiments and I'll help

I sure would love to post my experiments, but I'm really struggling to put it all together. I don't even know where to start. What I want seems fairly simple (just by looking at the sync version), and I'm new to python so I'm really in pain to make all of this work. In the end I just want a async getResults() function that can be called from elsewhere just as if it were a sync getResults().

From what I understand of your answers, I have no choice but to use the aiohttp. So the other pieces of code are irrelevant.

In the last example you're giving, it just prints the result, whereas I want to return it as a dictionary url -> response, and it doesn't handle a loop.

Here is what I've tried, but I don't know how wrong I am :

async def getResultsAsync(urls):
    async def getResultAsync(url):
        async with aiohttp.ClientSession() as session:
            resp = await session.get(url)
            res = url, await resp.json()
        return res
    return dict(await asyncio.gather(*[getResultAsync(url) for url in urls]))

def getResults(urls):
    return asyncio.run(getResultsAsync(url))

Just a few small changes. I would personally leave the collection of all the results in the get_results function:

def get_results(urls):

    async def gather_coro():
        return await asyncio.gather(*(get_url_result(url) for url in urls))

    url_results = asyncio.run(gather_coro())
    return dict(url_results)

And then acquiring the single result to get_url_result coroutine:

async def get_url_result(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return resp.url, await resp.text()

Okay thanks, it seems I'm getting somewhere !

Two more questions:
1. The session is not reused in between the requests with that code, right ? Would it make the code much more complicated to make that so ?
2. My urls actually are actually composed of two parts, one that is static and another that is the search term, and needs to be url-escaped. And I want the search term to be the key of the dictionary result, not the whole url. So in other words, I can't just return resp.url, await resp.text(). I would instead like to just return resp and create the dictionary elsewhere. Is that possible? If so, how ?
Thx

1.) yes good point, you definitely don't want to create a session per request, so in this case you can just move the context manager in the gather_coro and pass it as a argument to get_url_result function:

def get_results(urls):

  async def gather_coro():
    async with aiohttp.ClientSession() as session:   # moved here from get_url_result
      return await asyncio.gather(*(get_url_result(url, session) for url in urls))

  url_results = asyncio.run(gather_coro())
  return dict(url_results)

2.) Yes, get_url_result can return just resp, creation of dictionary really shouldn't be a problem! But remember that in order to access response content like resp.text() you need to await it, this means you won't be able to access it from a normal function but only from coroutine, from docs -

Quote:aiohttp loads only the headers when .get() is executed, letting you decide to pay the cost of loading the body afterward, in a second asynchronous operation. Hence the await response.text().

lukee

mlieqo

lukee

mlieqo

lukee

mlieqo

lukee

mlieqo