Python Forum
Logic behind BeautifulSoup data-parsing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Logic behind BeautifulSoup data-parsing
#6
(Apr-11-2021, 01:38 PM)jimsxxl Wrote: Hello guys,
Im messing around abit with bs4, im trying to parse some data from Youtube as a "learning-project".
What im finding difficult to understand is, when searching for a element to parse (for example video title)...
what should i be looking at? What is the key to get video-title extracted from the HTML code?

How should i think when i inspect an object in my browser?
What piece of code am i interested in ?

Thank you in advance !

(Apr-12-2021, 10:51 AM)snippsat Wrote: If it work with requests_htm then it's okay.
I have only tested requests_htm(problem not updated regularly Github Repo) briefly,can also use Selenuim and load browser with --headless option.

requests_htm use pyppeteer which is default headless.
Some time is useful the see browser before go headless like see if push button or enter into field,
then Selenium can be better choice.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://www.youtube.com/channel/UCwTrHPEglCkDz54iSg9ss9Q/videos"
browser.get(url)
title = browser.find_elements_by_css_selector('#text-container')[0]
print(title.text)
Output:
kanalgratisdotse
The fasted way is using the YouTube API.
import requests

channel_id = 'UCwTrHPEglCkDz54iSg9ss9Q'
api_key = 'xxxxxxxxxxxxxxxxxxx'

url = f'https://www.googleapis.com/youtube/v3/channels?id={channel_id}&part=snippet&key={api_key}'
response = requests.get(url).json()
print(response['items'][0]['snippet']['title'])
Output:
kanalgratisdotse

Hi again snippsat!
Yeah, ive tried the —headless option in Selenium.

So basiclly request_html is the same as Selenium with headless-option (as far as getting html code) ?
I thought request_html was ”lighter” than Selenium for some reason, thats why i chose it.

If i would choose to use Selenium this time, would BeautifulSoup be unnessecary then?

I wanted to learn Bs4 in this project, would it be foolish to combine Selenium and BS4 ?

Thanks alot for your replys snippsat !
Reply


Messages In This Thread
RE: Logic behind BeautifulSoup data-parsing - by jimsxxl - Apr-12-2021, 03:06 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup not parsing other URLs giddyhead 0 1,217 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  BeautifulSoup: 6k records - but stops after parsing 20 lines apollo 0 1,832 May-10-2021, 05:08 PM
Last Post: apollo
  fetching, parsing data from Wikipedia apollo 2 3,578 May-06-2021, 08:08 PM
Last Post: snippsat
  Extract data with Selenium and BeautifulSoup nestor 3 3,967 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Fetching and Parsing XML Data FalseFact 3 3,297 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  BeautifulSoup Parsing Error slinkplink 6 9,637 Feb-12-2018, 02:55 PM
Last Post: seco
  Beautifulsoup parsing Larz60+ 7 6,112 Apr-05-2017, 03:07 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020