Logic behind BeautifulSoup data-parsing

jimsxxl · (This post was last modified: Apr-12-2021, 03:07 PM by jimsxxl.)

(Apr-11-2021, 01:38 PM)jimsxxl Wrote: Hello guys,
Im messing around abit with bs4, im trying to parse some data from Youtube as a "learning-project".
What im finding difficult to understand is, when searching for a element to parse (for example video title)...
what should i be looking at? What is the key to get video-title extracted from the HTML code?

How should i think when i inspect an object in my browser?
What piece of code am i interested in ?

Thank you in advance !

(Apr-12-2021, 10:51 AM)snippsat Wrote: If it work with requests_htm then it's okay.
I have only tested requests_htm(problem not updated regularly Github Repo) briefly,can also use Selenuim and load browser with --headless option.

requests_htm use pyppeteer which is default headless.
Some time is useful the see browser before go headless like see if push button or enter into field,
then Selenium can be better choice.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://www.youtube.com/channel/UCwTrHPEglCkDz54iSg9ss9Q/videos"
browser.get(url)
title = browser.find_elements_by_css_selector('#text-container')[0]
print(title.text)
Output:
kanalgratisdotse
The fasted way is using the YouTube API.
import requests

channel_id = 'UCwTrHPEglCkDz54iSg9ss9Q'
api_key = 'xxxxxxxxxxxxxxxxxxx'

url = f'https://www.googleapis.com/youtube/v3/channels?id={channel_id}&part=snippet&key={api_key}'
response = requests.get(url).json()
print(response['items'][0]['snippet']['title'])
Output:
kanalgratisdotse

Hi again snippsat!
Yeah, ive tried the —headless option in Selenium.

So basiclly request_html is the same as Selenium with headless-option (as far as getting html code) ?
I thought request_html was ”lighter” than Selenium for some reason, thats why i chose it.

If i would choose to use Selenium this time, would BeautifulSoup be unnessecary then?

I wanted to learn Bs4 in this project, would it be foolish to combine Selenium and BS4 ?

Thanks alot for your replys snippsat !

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	BeautifulSoup not parsing other URLs	giddyhead	0	1,217	Feb-23-2022, 05:35 PM Last Post: giddyhead
	BeautifulSoup: 6k records - but stops after parsing 20 lines	apollo	0	1,832	May-10-2021, 05:08 PM Last Post: apollo
	fetching, parsing data from Wikipedia	apollo	2	3,578	May-06-2021, 08:08 PM Last Post: snippsat
	Extract data with Selenium and BeautifulSoup	nestor	3	3,967	Jun-06-2020, 01:34 AM Last Post: Larz60+
	Fetching and Parsing XML Data	FalseFact	3	3,297	Apr-01-2019, 10:21 AM Last Post: Larz60+
	BeautifulSoup Parsing Error	slinkplink	6	9,637	Feb-12-2018, 02:55 PM Last Post: seco
	Beautifulsoup parsing	Larz60+	7	6,112	Apr-05-2017, 03:07 AM Last Post: Larz60+

Logic behind BeautifulSoup data-parsing

User Panel Messages

Announcements