Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 [split] Using beautiful soup to get html attribute value
#1
# I want to get <h2>, 'Official website' and 'Address' from this website as a dataFrame using BeautifulSoup
# I see the items 14 of them on the html source code pls help
page_link = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
# this is the url that we've already determined is safe and legal to scrape from.
page_response = requests.get(page_link)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(url_get.content, 'html.parser')
#we use the html parser to parse the url content and store it in a variable.
buran wrote Jun-03-2019, 11:14 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

Also, don't hi-jack threads
Quote
#2
Your code does not work, so I made some tiny changes:

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

page_link     = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
page_response = requests.get(page_link)
page_content  = BeautifulSoup(page_response.content, 'html.parser')

print(page_content.prettify())
Quote
#3
Thanks a lot, heiner55. Good job.
Is there a way I can extract the headers (the 14 Top-Rated Tourist Attractions), their given 'Addresses' and their listed 'Official websites'. These are all on the page. My eventual goal is to get a pandas dataframe and even append their locations, ratings etc, later
Quote
#4
Sure. But I thought you will try it yourself.
Here is a sample:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Quote
#5
Yes I am trying but I feel I am more of doing it manually. I am doing so right now but of course I will appreciate.... I really appreciate the ".prettify())" line you added. Cheers.
Quote
#6
Here some basics stuff using the first Tourist Attraction.
>>> article = page_content.find('div', class_="article_block site")
>>> article.find('h2')
<h2 class="sitename" id="N-OSL-VIG"><b>1</b>  Vigeland Sculpture Park</h2>
>>> article.find('h2').text
'1  Vigeland Sculpture Park'
>>> 
>>> official_site = article.find('div', class_="web").find('a')
>>> official_site.get('href')
'http://www.vigeland.museum.no/en/vigeland-park'
As mention bye @heiner55,try yourself.
We have a tutorial here Web-Scraping part-1
Quote
#7
Great tutorial. Thanks for the kind of patience you have for newbies.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Getting a specific text inside an html with soup mathieugrimbert 9 1,459 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Beautiful soup and tags starter_student 11 779 Jul-08-2019, 03:41 PM
Last Post: starter_student
  Beautiful Soup find_all() kirito85 2 480 Jun-14-2019, 02:17 AM
Last Post: kirito85
  Using beautiful soup to get html attribute value graham23s 2 1,520 Apr-23-2019, 09:21 PM
Last Post: graham23s
  Failure in web scraping by Beautiful Soup yeungcase 4 1,049 Mar-23-2019, 12:36 PM
Last Post: metulburr
  Beautiful soup won't find value even with CSS path copied. AdequatelyChilled 4 801 Jan-01-2019, 12:12 PM
Last Post: snippsat
  Need help with Beautiful Soup - table jlkmb 9 975 Dec-20-2018, 01:10 AM
Last Post: jlkmb
  using regex wildcard Beautiful Soup Larz60+ 6 2,647 Sep-27-2018, 09:19 PM
Last Post: Larz60+
  Beautiful Soup - Title + Paragraph into a text file dj99 4 1,950 Jul-14-2018, 01:37 PM
Last Post: dj99
  Beautiful Soup - Delete All HTML - Except Specific Classes dj99 7 1,829 Jul-13-2018, 08:18 AM
Last Post: dj99

Forum Jump:


Users browsing this thread: 1 Guest(s)