Python Forum
Logic behind BeautifulSoup data-parsing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Logic behind BeautifulSoup data-parsing
#4
Thanks alot snippsat for the links... it clarified acouple of things for me!
Im using requests_html instead of requests because i noticed that requests got stuck in the "Agree to continue"-page Youtube have.
In my last Betfair project i fixed that with .click, but i wanted to see if it could be done without Selenium and loading a browser into the program.

Here is the code so far:
from bs4 import BeautifulSoup as bs
from requests_html import HTMLSession

tempfile = "/home/xxx/projects/jims-youtube_scraper/tempvideofile.html"
channels = [
    'https://www.youtube.com/channel/UCwTrHPEglCkDz54iSg9ss9Q/videos'       # KanalGratis
    #'https://www.youtube.com/user/svartzonker/videos'                      # Svartzonker
]

title = []
link = []
count = 0

session = HTMLSession()

for c in channels:
    get_response = session.get(c)
    get_response.html.render(sleep=1)
    open(tempfile, "w", encoding='utf8').write(get_response.html.html)
    opentemp = open(tempfile, 'r')
    soup = bs(opentemp, 'html.parser')

    #name = soup.find('yt', class_='style-scope ytd-channel-name')
    #print(name.get('text'))

    for t in soup.find_all('a', class_='yt-simple-endpoint style-scope ytd-grid-video-renderer'):
        title.append(t.get('title'))

    for l in soup.find_all('a', class_='yt-simple-endpoint style-scope ytd-grid-video-renderer'):
        link.append(l.get('href'))

while count != len(title):
    print("Title:", title[count], "URL: www.youtube.com" + link[count])

    count = count + 1
Please, let me know if i could had done it in a better way, or if something looks funny to you.
I would really appreciate some feedback from experianced Python-coders !
Reply


Messages In This Thread
RE: Logic behind BeautifulSoup data-parsing - by jimsxxl - Apr-11-2021, 05:25 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup not parsing other URLs giddyhead 0 1,204 Feb-23-2022, 05:35 PM
Last Post: giddyhead
  BeautifulSoup: 6k records - but stops after parsing 20 lines apollo 0 1,823 May-10-2021, 05:08 PM
Last Post: apollo
  fetching, parsing data from Wikipedia apollo 2 3,566 May-06-2021, 08:08 PM
Last Post: snippsat
  Extract data with Selenium and BeautifulSoup nestor 3 3,938 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Fetching and Parsing XML Data FalseFact 3 3,288 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  BeautifulSoup Parsing Error slinkplink 6 9,598 Feb-12-2018, 02:55 PM
Last Post: seco
  Beautifulsoup parsing Larz60+ 7 6,093 Apr-05-2017, 03:07 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020