re.findall
returns all captured groups in a list,in this case what's inside
group(1) --> (.*?)
.
re.search
return first match inside
group(1) --> (.*?)
.
import re
link = '''\
<a href="http://www.google.com">Google</a>
<a href="https://www.microsoft.com">Microsoft</a>'''
print(re.search('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE).group(1))
print('--------------')
print(re.findall('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE))
Output:
http://www.google.com
--------------
['http://www.google.com', 'https://www.microsoft.com']
Both solution can be looked at as the wrong way,because HTML should not be used with regex read
You can't parse [X]HTML with regex
from bs4 import BeautifulSoup
link = '''\
<a href="http://www.google.com">Google</a>
<a href="https://www.microsoft.com">Microsoft</a>'''
soup = BeautifulSoup(link, 'lxml')
for link in soup.find_all('a'):
print(link.get('href'))
Output:
http://www.google.com
https://www.microsoft.com