Python Forum

Hi all,

I am a newbie on Python, forgive me if my question sounds silly. So I am trying to print out all the <a> url under each h2 section.
The html structure is like:

<div>
<h2>
<a href='xxxx'>
The content I want to print out 1
</a>
</h2>
<div>

<div>
<h2>
<a href='xxxx'>
The content I want to print out 2
</a>
</h2>
<div>

And the code I am using is like:

import requests
from bs4 import BeautifulSoup

r=requests.get("http://xxxxxxxxx/")
source_code = r.text
soup=BeautifulSoup(source_code,"html.parser").find_all("h2")
for link in soup:
print(link)

But how can I print out the results like:

The content I want to print out 1
The content I want to print out 2

Thanks a lot for the help!

BR,
Henry

you've got it most of the way. your find_all h2s are including the h2 tag, so you need to find the a tag after that h2.a or h2.find('a') and then you need to get the text and strip all whitespace from the outer edges of it.

from bs4 import BeautifulSoup

html = '''
<div>
<h2>
<a href='xxxx'>
The content I want to print out 1
</a>
</h2>
<div>

<div>
<h2>
<a href='xxxx'>
The content I want to print out 2
</a>
</h2>
<div>
'''

soup = BeautifulSoup(html,"html.parser")
h2s = soup.find_all("h2")
for h2 in h2s:
    print(h2.a.text.strip())

Output:The content I want to print out 1
The content I want to print out 2

if you wanted to get the actual link

    print(h2.a['href'])

Output:xxxx
xxxx

(Feb-02-2018, 02:51 AM)metulburr Wrote: [ -> ]print(h2.a.text.strip())

Thank you so much for your kind help. It works!

Best Regards,
Henry

HenryJ

metulburr

HenryJ