Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 [split] Using beautiful soup to get html attribute value
#1
# I want to get <h2>, 'Official website' and 'Address' from this website as a dataFrame using BeautifulSoup
# I see the items 14 of them on the html source code pls help
page_link = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
# this is the url that we've already determined is safe and legal to scrape from.
page_response = requests.get(page_link)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(url_get.content, 'html.parser')
#we use the html parser to parse the url content and store it in a variable.
buran wrote Jun-03-2019, 11:14 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

Also, don't hi-jack threads
Quote
#2
Your code does not work, so I made some tiny changes:

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

page_link     = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
page_response = requests.get(page_link)
page_content  = BeautifulSoup(page_response.content, 'html.parser')

print(page_content.prettify())
Quote
#3
Thanks a lot, heiner55. Good job.
Is there a way I can extract the headers (the 14 Top-Rated Tourist Attractions), their given 'Addresses' and their listed 'Official websites'. These are all on the page. My eventual goal is to get a pandas dataframe and even append their locations, ratings etc, later
Quote
#4
Sure. But I thought you will try it yourself.
Here is a sample:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Quote
#5
Yes I am trying but I feel I am more of doing it manually. I am doing so right now but of course I will appreciate.... I really appreciate the ".prettify())" line you added. Cheers.
Quote
#6
Here some basics stuff using the first Tourist Attraction.
>>> article = page_content.find('div', class_="article_block site")
>>> article.find('h2')
<h2 class="sitename" id="N-OSL-VIG"><b>1</b>  Vigeland Sculpture Park</h2>
>>> article.find('h2').text
'1  Vigeland Sculpture Park'
>>> 
>>> official_site = article.find('div', class_="web").find('a')
>>> official_site.get('href')
'http://www.vigeland.museum.no/en/vigeland-park'
As mention bye @heiner55,try yourself.
We have a tutorial here Web-Scraping part-1
Quote
#7
Great tutorial. Thanks for the kind of patience you have for newbies.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  spliting html code with br tag yokaso 11 233 Aug-07-2019, 03:18 PM
Last Post: snippsat
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 217 Aug-06-2019, 07:23 AM
Last Post: fishhook
  How do I get rid of the HTML tags in my output? glittergirl 1 212 Aug-05-2019, 08:30 PM
Last Post: snippsat
  convert html table to json bhojendra 5 149 Jul-30-2019, 07:53 PM
Last Post: DeaD_EyE
  How to capture Single Column from Web Html Table? ahmedwaqas92 5 282 Jul-29-2019, 02:17 AM
Last Post: ahmedwaqas92
  Getting a specific text inside an html with soup mathieugrimbert 9 356 Jul-10-2019, 12:40 PM
Last Post: mathieugrimbert
  Beautiful soup and tags starter_student 11 517 Jul-08-2019, 03:41 PM
Last Post: starter_student
  getting options from a html form pgoosen 5 339 Jul-03-2019, 06:07 PM
Last Post: nilamo
  How to send data from remotely hosted HTML form to Pi sajid 2 252 Jun-27-2019, 10:28 PM
Last Post: sajid
  Beautiful Soup find_all() kirito85 2 280 Jun-14-2019, 02:17 AM
Last Post: kirito85

Forum Jump:


Users browsing this thread: 1 Guest(s)