async for - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: async for (/thread-3406.html) |
async for - wavic - May-21-2017 Well I have a list of URLs and I want to iterate over it so do get the pages as fast as it's possible. I know that there is async for loop but can't get it how it works Basically this is what I want # urls async for link in urls: print('{},{}'.format(await get_email(link))) # this is simplified. I am doing something else # get_email async def get_email(link): page = await fetch(link) soup = BeautifulSoup(page, 'lxml') name = soup.find('div', class_='MProwD').text.strip().lower().title() try: email = soup.find('div', class_='MPinfo').find_all('a')[-1]['href'].split(':')[1].strip() except: email = 'Unknown' return name, email #fetch async def fetch(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return response.read()Until there wasn't an error I have not noticed performance difference from the regular program. I have changed the code so many times and now it gives me an error and I don't even know what caused it. Can't get this async stuff very well yet I've tried to subclass the list as I saw it in some web pages so to get an object with __aiter__ Didn't work I've tried to yielding each list element. def list_gen(l): i = 0 try: yield l[i] i += 1 except StopIteration: return RE: async for - Larz60+ - May-21-2017 nilamo had a good post on async: https://python-forum.io/Thread-Exploring-async-await-without-knowing-how-they-work-ahead-of-time?highlight=async RE: async for - wavic - May-21-2017 I've done it but not using asyncio. I got rid of all of this and used concurrent.futures instead with futures.ThreadPoolExecutor(200) as executor: results = executor.map(get_email, links) # the result is a generator so you have to list() itThe execution time dropped more than two times Three lines of code including the import statement and such a difference RE: async for - snippsat - May-21-2017 Quote:The execution time dropped more than two timesIt can/should drop a lot more, it do of course depend on task if downloading larger pieces or getting text. Try using ProcessPoolExecutor. Eg. with futures.ProcessPoolExecutor(max_workers=20) as executor: results = executor.submit(get_email, links)Let say downloading 100 images from a site ca 2-min, down to 15-20-sec with ProcessPoolExecutor in my tests. It's of course heavy to launch ProcessPoolExecutor(multiprocessing) for this, but the speed is really great RE: async for - wavic - May-23-2017 So, I have tried ProcessPoolExecutor and I managed to reduce the running time to 5.661 sec as the best results from few trials. Since this is networking it is relative but yet is faster than ThreadPoolExecutor. I have to try different numbers for the last one too I have tried few numbers of max_workers number and 32 gave look like the optimum Little changes to get the results: results = [] with futures.ProcessPoolExecutor(max_workers=32) as executor: for result in executor.map(get_email, links): results.append(result) RE: async for - nilamo - Jun-06-2017 (May-21-2017, 05:05 PM)Larz60+ Wrote: nilamo had a good post on async: https://python-forum.io/Thread-Exploring-async-await-without-knowing-how-they-work-ahead-of-time?highlight=async Just for functions. I'm not sure how async for loops or async with blocks are supposed to work. Do each of the elements of the iterable run asyncronously? Or is it just syntactic sugar to let you use await in the body of the block? I... have no idea. I'm also a little surprised a bare async for loop would work, I thought that was a syntax error, and they had to be contained within an async callable. Quick test: >>> async for _ in range(50): File "<stdin>", line 1 async for _ in range(50): ^ SyntaxError: invalid syntax >>> async def spam(): ... async for _ in range(50): ... pass ... >>>Ok, so a bare for loop can't be async, it must be inside an async callable. I can't actually offer help, since I don't know what it's supposed to do, though :/ RE: async for - nilamo - Jun-06-2017 (May-23-2017, 08:03 AM)wavic Wrote: So, I have tried ProcessPoolExecutor and I managed to reduce the running time to 5.661 sec as the best results from few trials. It's also worth noting that starting a new thread/process is not "free". They take time to spin up. That's one of the main benefits of async... there is no downtime with setup, you just get to do things while waiting for something else to finish (like network traffic). Quote:results = [] with futures.ProcessPoolExecutor(max_workers=32) as executor: for result in executor.map(get_email, links): results.append(result) I'm not sure what that does, but... are you having all 32 workers process the same list of links? Wouldn't you want to break those apart so each worker processes a different list? RE: async for - wavic - Jun-07-2017 As I know the iterable for async for expression have to be a generator/coroutine. I had obstacles to achieve this. Maybe a lack of experience, I don't know. This special kind of iterable has to be constructed before the loop and call it.async for number in AsyncIterClass(iterable): # actionI had read in SO for async list comprehension and I will try it. list_ = [await function(element) for element in AsyncIterClass(iterable)]Perhaps this asynchronous iterable class should be something like this: class Aiter: def __self__(self): self.iter_ = iter(iterable) def __aiter__(self): return self async def __anext__(self): try: element = await next(self.iter_) except StopIteration: raise StopAsyncIteration return elementAfter that both async for and async comprehension should work. I didn't try it. Yet Ref: https://www.python.org/dev/peps/pep-0492/#asynchronous-iterators-and-async-for But at the nearly bottom they don't recommend to use it this way. I don't know why: Quote:While this is not a very useful thing to do, ..... |