Different Output of findall and search in re module

***snippsat*** · (This post was last modified: Mar-12-2018, 08:39 PM by snippsat.)

re.findall returns all captured groups in a list,in this case what's inside group(1) --> (.*?).
re.search return first match inside group(1) --> (.*?).

import re

link = '''\
 <a href="http://www.google.com">Google</a>
 <a href="https://www.microsoft.com">Microsoft</a>'''

print(re.search('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE).group(1))
print('--------------')
print(re.findall('<a[^>]+href=["\'](.*?)["\']',link,re.IGNORECASE))

Output:http://www.google.com
--------------
['http://www.google.com', 'https://www.microsoft.com']

Both solution can be looked at as the wrong way,because HTML should not be used with regex read You can't parse [X]HTML with regex Evil

from bs4 import BeautifulSoup

link = '''\
 <a href="http://www.google.com">Google</a>
 <a href="https://www.microsoft.com">Microsoft</a>'''

soup = BeautifulSoup(link, 'lxml')
for link in soup.find_all('a'):
    print(link.get('href'))

Output:http://www.google.com
https://www.microsoft.com

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	.findAll()	Truman	8	5,440	Nov-17-2018, 01:27 AM Last Post: snippsat
	re.findall help searching for string in xml response	mugster	2	3,263	May-30-2018, 03:27 PM Last Post: mugster

Different Output of findall and search in re module

User Panel Messages

Announcements