![]() |
[split] Using beautiful soup to get html attribute value - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: [split] Using beautiful soup to get html attribute value (/thread-18836.html) |
[split] Using beautiful soup to get html attribute value - moski - Jun-03-2019 # I want to get <h2>, 'Official website' and 'Address' from this website as a dataFrame using BeautifulSoup # I see the items 14 of them on the html source code pls help page_link = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm' # this is the url that we've already determined is safe and legal to scrape from. page_response = requests.get(page_link) # here, we fetch the content from the url, using the requests library page_content = BeautifulSoup(url_get.content, 'html.parser') #we use the html parser to parse the url content and store it in a variable. RE: [split] Using beautiful soup to get html attribute value - heiner55 - Jun-03-2019 Your code does not work, so I made some tiny changes: #!/usr/bin/python3 import requests from bs4 import BeautifulSoup page_link = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm' page_response = requests.get(page_link) page_content = BeautifulSoup(page_response.content, 'html.parser') print(page_content.prettify()) RE: [split] Using beautiful soup to get html attribute value - moski - Jun-03-2019 Thanks a lot, heiner55. Good job. Is there a way I can extract the headers (the 14 Top-Rated Tourist Attractions), their given 'Addresses' and their listed 'Official websites'. These are all on the page. My eventual goal is to get a pandas dataframe and even append their locations, ratings etc, later RE: [split] Using beautiful soup to get html attribute value - heiner55 - Jun-03-2019 Sure. But I thought you will try it yourself. Here is a sample: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ RE: [split] Using beautiful soup to get html attribute value - moski - Jun-03-2019 Yes I am trying but I feel I am more of doing it manually. I am doing so right now but of course I will appreciate.... I really appreciate the ".prettify())" line you added. Cheers. RE: [split] Using beautiful soup to get html attribute value - snippsat - Jun-03-2019 Here some basics stuff using the first Tourist Attraction. >>> article = page_content.find('div', class_="article_block site") >>> article.find('h2') <h2 class="sitename" id="N-OSL-VIG"><b>1</b> Vigeland Sculpture Park</h2> >>> article.find('h2').text '1 Vigeland Sculpture Park' >>> >>> official_site = article.find('div', class_="web").find('a') >>> official_site.get('href') 'http://www.vigeland.museum.no/en/vigeland-park'As mention bye @heiner55,try yourself. We have a tutorial here Web-Scraping part-1 RE: [split] Using beautiful soup to get html attribute value - moski - Jun-03-2019 Great tutorial. Thanks for the kind of patience you have for newbies. |