[split] Using beautiful soup to get html attribute value

moski · (This post was last modified: Jun-03-2019, 11:14 AM by buran.)

# I want to get <h2>, 'Official website' and 'Address' from this website as a dataFrame using BeautifulSoup
# I see the items 14 of them on the html source code pls help
page_link = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
# this is the url that we've already determined is safe and legal to scrape from.
page_response = requests.get(page_link)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(url_get.content, 'html.parser')
#we use the html parser to parse the url content and store it in a variable.

heiner55 · (This post was last modified: Jun-03-2019, 01:58 PM by heiner55.)

Your code does not work, so I made some tiny changes:

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

page_link     = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
page_response = requests.get(page_link)
page_content  = BeautifulSoup(page_response.content, 'html.parser')

print(page_content.prettify())

moski · Jun-03-2019, 03:07 PM

Thanks a lot, heiner55. Good job.
Is there a way I can extract the headers (the 14 Top-Rated Tourist Attractions), their given 'Addresses' and their listed 'Official websites'. These are all on the page. My eventual goal is to get a pandas dataframe and even append their locations, ratings etc, later

heiner55 · (This post was last modified: Jun-03-2019, 03:32 PM by heiner55.)

Sure. But I thought you will try it yourself.
Here is a sample:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

moski · Jun-03-2019, 03:36 PM

Yes I am trying but I feel I am more of doing it manually. I am doing so right now but of course I will appreciate.... I really appreciate the ".prettify())" line you added. Cheers.

***snippsat*** · Jun-03-2019, 03:41 PM

Here some basics stuff using the first Tourist Attraction.

>>> article = page_content.find('div', class_="article_block site")
>>> article.find('h2')
<h2 class="sitename" id="N-OSL-VIG"><b>1</b>  Vigeland Sculpture Park</h2>
>>> article.find('h2').text
'1  Vigeland Sculpture Park'
>>> 
>>> official_site = article.find('div', class_="web").find('a')
>>> official_site.get('href')
'http://www.vigeland.museum.no/en/vigeland-park'

As mention bye @heiner55,try yourself.
We have a tutorial here Web-Scraping part-1

moski · Jun-03-2019, 04:24 PM

Great tutorial. Thanks for the kind of patience you have for newbies.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Soup('A')	new_coder_231013	6	2,669	Aug-12-2023, 10:55 AM Last Post: Pubfonts
	Beautiful Soup - access a rating value in a class	KatMac	1	3,517	Apr-16-2021, 01:27 PM Last Post: snippsat
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,727	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Beginner web scraping/Beautiful Soup help	7ken8	2	2,681	Jan-28-2021, 04:26 PM Last Post: 7ken8
	Help: Beautiful Soup - Parsing HTML table	ironfelix717	2	2,750	Oct-01-2020, 02:19 PM Last Post: snippsat
	Beautiful Soup (suddenly) doesn't get full webpage html	j.crater	8	17,304	Jul-11-2020, 04:31 PM Last Post: j.crater
	Requests-HTML vs Beautiful Soup - How to Choose?	robin73	0	3,861	Jun-23-2020, 02:53 PM Last Post: robin73
	html data cell attribute issue	delahug	5	3,213	May-31-2020, 09:18 AM Last Post: delahug
	[split] Pytest-html add screenshots help	rafiPython1	1	8,075	Apr-30-2020, 07:16 PM Last Post: Gourav
	Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row	BrandonKastning	0	2,406	Mar-22-2020, 06:10 AM Last Post: BrandonKastning

[split] Using beautiful soup to get html attribute value

User Panel Messages

Announcements