Python Forum

Full Version: how to print out all the link <a> under each h2 section using beautifulsoup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,

I am a newbie on Python, forgive me if my question sounds silly. So I am trying to print out all the <a> url under each h2 section.
The html structure is like:

<div>
<h2>
<a href='xxxx'>
The content I want to print out 1
</a>
</h2>
<div>

<div>
<h2>
<a href='xxxx'>
The content I want to print out 2
</a>
</h2>
<div>



And the code I am using is like:

import requests
from bs4 import BeautifulSoup

r=requests.get("http://xxxxxxxxx/")
source_code = r.text
soup=BeautifulSoup(source_code,"html.parser").find_all("h2")
for link in soup:
print(link)


But how can I print out the results like:

The content I want to print out 1
The content I want to print out 2

Thanks a lot for the help!

BR,
Henry
you've got it most of the way. your find_all h2s are including the h2 tag, so you need to find the a tag after that h2.a or h2.find('a') and then you need to get the text and strip all whitespace from the outer edges of it.

from bs4 import BeautifulSoup

html = '''
<div>
<h2>
<a href='xxxx'>
The content I want to print out 1
</a>
</h2>
<div>

<div>
<h2>
<a href='xxxx'>
The content I want to print out 2
</a>
</h2>
<div>
'''

soup = BeautifulSoup(html,"html.parser")
h2s = soup.find_all("h2")
for h2 in h2s:
    print(h2.a.text.strip())
Output:
The content I want to print out 1 The content I want to print out 2
if you wanted to get the actual link
    print(h2.a['href'])
Output:
xxxx xxxx
(Feb-02-2018, 02:51 AM)metulburr Wrote: [ -> ]print(h2.a.text.strip())
Thank you so much for your kind help. It works!

Best Regards,
Henry