Different Output of findall and search in re module - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Different Output of findall and search in re module (/thread-8904.html) |
Different Output of findall and search in re module - shiva - Mar-12-2018 link = '<a href="http://www.google.com">Google</a>' re.search('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE).group()This code gives the output '<a href="http://www.google.com"' re.findall('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE)But this code gives the output ['http://www.google.com'] Why are both the outputs different? findall() should work like search() except findall() gives a list of matches and search() gives only a single match. RE: Different Output of findall and search in re module - snippsat - Mar-12-2018 re.findall returns all captured groups in a list,in this case what's inside group(1) --> (.*?) .re.search return first match inside group(1) --> (.*?) .import re link = '''\ <a href="http://www.google.com">Google</a> <a href="https://www.microsoft.com">Microsoft</a>''' print(re.search('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE).group(1)) print('--------------') print(re.findall('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE)) Both solution can be looked at as the wrong way,because HTML should not be used with regex read You can't parse [X]HTML with regex from bs4 import BeautifulSoup link = '''\ <a href="http://www.google.com">Google</a> <a href="https://www.microsoft.com">Microsoft</a>''' soup = BeautifulSoup(link, 'lxml') for link in soup.find_all('a'): print(link.get('href'))
|