I'm looking for transform the following code into parallel execution :
urls = ['http://example1.org', 'http://example2.org', '...']
def getResults(urls):
results = {}
for url in urls:
results[url] = getResult(url)
return results
def getResult(url):
return get(url).json()
Here is what I've tried:
urls = ['http://example1.org', 'http://example2.org', '...']
def getResults(urls):
return asyncio.gather((getResult(url)) for url in urls)
async def getResult(url):
return await get(url).json()
Am I on the right track ? What's the correct way to use async/await with python 3?
Thanks
1.)
getResults
function needs to be async function as well because you need to await
asyncio.gather()
2.)
asyncio.gather
accepts individual coroutines as arguments not an iterable (specifically generator in your case) so you need to pass them separately
async def getResults(urls):
return await asyncio.gather(*(getResult(url) for url in urls))
Also what is the
get
function you use in your
getResult
? You need to make sure that you use asynchronous requests - for that you can use library like
aiohttp
->
https://docs.aiohttp.org/en/stable/clien...-a-request or
httpx
->
https://www.python-httpx.org/async/, otherwise it will all run synchronously
And last thing, you need to run the code, f.e. by adding this at the end:
if __name__ == '__main__':
asyncio.run(getResults(urls))
Thanks for your reply and sorry for the late response.
I have some questions:
- is it possible to make it such that all the asynchronous stuff occurs in the getResults() function, such that I don't need to call it with asyncio.run (in other words 'put' asyncio.run inside the getResults()) ?
- I wanted to return a dictionary of url -> result (see my synchronous version), how to achieve that ?
- I'm using the get from requests. How bad would it be not to use aiohttp ? Does the 'thread split' not occur before the get is called? In other words, I thought I was making a sync get but in a sort of separate thread, so I thought it wouldn't be significant. If I'm wrong, can you please give me the version with aiohttp ?
Thank you very much
It doesn't seem I can edit so let me add another post to clarify the first question of my previous post:
- I want to have the same 'API' as before, which is just getResults() without having to change the rest of the code.
- However this time I want getResults() to run the http requests in parallel, and then wait for all the requests to be finished and return them as a dictionary of results.
1.) Yes it's possible, you can just call the
asyncio.run
in
getResults
but you will have to move the
asyncio.gather
to a new coroutine
2.) Let's say your new
getResult
function will look something like this:
async def get_url_result(url):
response = await asynchronous_get_request(url) # not an actual working code
return response
then response will be a some kind of
Response
object which will have
url
attribute and the actual response that you can then use to update your dictionary with
3.)
requests
is a blocking library, which would make the whole point of using asyncio in your case pointless. There is actually no "thread split", asyncio runs only in one thread and it runs concurrently and not in parallel. What happens when you use non-blocking call in this case, is that when a request is made, while it waits for response, it gives back control to the event loop, so other coroutines that are scheduled on it, can be run. There is a nice explanation on real python ->
https://realpython.com/async-io-python/
Regarding an example for aiohttp, there is a really simple one in the link I sent you:
async with aiohttp.ClientSession() as session:
async with session.get('http://httpbin.org/get') as resp:
print(resp.status)
print(await resp.text())
It would be good if you'll try to do it on your own, post your experiments and I'll help
I sure would love to post my experiments, but I'm really struggling to put it all together. I don't even know where to start. What I want seems fairly simple (just by looking at the sync version), and I'm new to python so I'm really in pain to make all of this work. In the end I just want a async getResults() function that can be called from elsewhere just as if it were a sync getResults().
From what I understand of your answers, I have no choice but to use the aiohttp. So the other pieces of code are irrelevant.
In the last example you're giving, it just prints the result, whereas I want to return it as a dictionary url -> response, and it doesn't handle a loop.
Here is what I've tried, but I don't know how wrong I am :
async def getResultsAsync(urls):
async def getResultAsync(url):
async with aiohttp.ClientSession() as session:
resp = await session.get(url)
res = url, await resp.json()
return res
return dict(await asyncio.gather(*[getResultAsync(url) for url in urls]))
def getResults(urls):
return asyncio.run(getResultsAsync(url))
Just a few small changes. I would personally leave the collection of all the results in the
get_results
function:
def get_results(urls):
async def gather_coro():
return await asyncio.gather(*(get_url_result(url) for url in urls))
url_results = asyncio.run(gather_coro())
return dict(url_results)
And then acquiring the single result to
get_url_result
coroutine:
async def get_url_result(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return resp.url, await resp.text()
Okay thanks, it seems I'm getting somewhere !
Two more questions:
1. The session is not reused in between the requests with that code, right ? Would it make the code much more complicated to make that so ?
2. My urls actually are actually composed of two parts, one that is static and another that is the search term, and needs to be url-escaped. And I want the search term to be the key of the dictionary result, not the whole url. So in other words, I can't just return resp.url, await resp.text()
. I would instead like to just return resp
and create the dictionary elsewhere. Is that possible? If so, how ?
Thx
1.) yes good point, you definitely don't want to create a session per request, so in this case you can just move the context manager in the
gather_coro
and pass it as a argument to
get_url_result
function:
def get_results(urls):
async def gather_coro():
async with aiohttp.ClientSession() as session: # moved here from get_url_result
return await asyncio.gather(*(get_url_result(url, session) for url in urls))
url_results = asyncio.run(gather_coro())
return dict(url_results)
2.) Yes,
get_url_result
can return just
resp
, creation of dictionary really shouldn't be a problem! But remember that in order to access response content like
resp.text()
you need to await it, this means you won't be able to access it from a normal function but only from coroutine, from docs -
Quote:aiohttp loads only the headers when .get() is executed, letting you decide to pay the cost of loading the body afterward, in a second asynchronous operation. Hence the await response.text().